= 


+ 
` 


Psychometrika 


vo LUME. XVIIL-1953 
P JANUARY- DECEMBER . 


SER: E 7387 


ГА ЈА пе +S, : 


вант А 


» 


* Managing Editor:— 


; ? к 
Chairman:—Hanorp GULLIKSEN, ` -~ 
v - Повотну C. ADKINS 


Assistant Managing. Editor: — 


Editors:—M. W. RicHARDSON "^ — 
SOIT ВЈ чив 


Раш Horst 7 
* 


~ Editorial Board 
; We. м М Ё 
с В. Г. ANDERSON HAROLD GULLIKSEN FREDERICK MOSTELLER 
J.B. CARROLL | . _ Снаврев M. Hans '. Свонве Е. NICHOLSON 
' Paun Horst. * М. W. RICHARDSON 


Н. $. Сомвар _ 

L. J. CRONBACH 

E. E. CURETON 
ALLEN EDWARDS 
Max D. ENGELHART 


ALSTON S. HousEHOLDER Wm. STEPHENSON 
Lyre V. JoNES '. GODFREY THOMSON 
Truman L. KELLEY R. L. THORNDIKE 


ALBERT K. Kurri, T L. Г. Тновзтомв 
LEDYARD' TUCKER 


Henry E. GARRETT Irvine Loren ge 
J. P. GUILFORD | Quinn МОМЕМАВ | S: S. Wks - 
LN - 2 Жу КУ, $ 
" Р A 


PUBLISHED QUARTERLY 


By» THE. PSYCHOMETRIC SOCIETY 


mi M ^ дт 1407 SHERWOOD AVENUE. ‹ 


ү - 
t RICHMOND. 5, VIRG 


q^ „От 


l% 


i 
у, 
p.a. о 


' 

О 
РЕ 
ə $ 

ЕА 
~ 

, 
. 
< 

О 


‚51° 
162 


. 


PSYCHOMETRIKA—VOL. 18, No. 1 " 
MARCH, 1953 А 


* 


TEST RELIABILITY AND EFFECTIVE TEST LENGTH* 


Хупалам Н. ANGOFF ` 


EDUCATIONAL TESTING SERVICE 


D 


Measures of effective test length are developed for speeded and power 
tests, which are independent of the number of items in the test or of the time 
required for administration. These, measures are used in determining re- 
liability for (1) speeded and power tests, where a separately timed short. 
parallel form is adminis red in addition to the full-length test; (2) power 
tests, where n subset of items is imbedded within the total test, parallel to 


the total test; апа (3) power tests, where the subset of items is correlated 
with the complementary parallel subset in the test. 


" cle, Cronbach (1) has pointed out that the characteris- 
tics of mental measurement that make the estimation of error particularly 
difficult are two-fold. First is that the very act of measuring produces a 
noticeable change in the object measured. The task of responding to test 
items, particularly items of a cognitive nature, is in itself a learning task, and 
on a second administration there is a variable positive bias in test performance 
which is generally atfributed to increased test wisdom or to more specific 
acquaintance with test content. Second is the fact that uncontrolled changes 
during the process of measurement, as well as changes associated with growth 
and senescence (or learning and forgetting), also produce a changed per- 
formance on the second administration. Tn both савез the changed perfor- 
ance can be interpreted, in the context of test reliability, only as variable 
error unnssociated with the reliability of the measuring instrument, and 
operating to reduce the size of the reliability coefficient. — 

In order to avoid attenuating the reliability coefficient, with experimental 
error resulting from a second administration, methods have been developed 
for measuring reliability through the use of statisties taken from a single test 
administration. In general, two such methods have been made available, 
the -Kuder-Richardson formulas and the split-half method (with Spearman- 
Brown correction for half length)—as well as variants of these later developed. 

While these methods have yielded relatively satisfactory results for 
power tests, where sufficient time is ‘given for all examinees to attempt all 
items, they have been considered totally inadequate for speeded tests. 


"The write tefull acknowledges the assistance of Dr. Led ard R Tucker in the 
formulation of corel of the concepts presented шша paper. He wishes also to express his 
appreciation for the hélpful comments of Dr. Harold Gulliksen and Dr. Frederic M. Lore 
in their review of the manuscript. у 


In a previous arti 


L 


2 PSYCHOMETRIKA 


Guilford (3, 486) and Thorndike (7, 582), for example, have pointed out that 
an odd-even split of items in a purely speeded test would yield a correlation 
between test halves of unity, regardless of the reliability of the test. On the 
other hand, assuming that all examinees complete the first half of the test, а 
Split of the first half against the second half would yield an indeterminate 
correlation, since the variability on the first half would be zero. In general, 
then, the computed reliability will be largely a function of the manner in 
which the test split has been made, and will tend not to reflect the actual 
reliability of the test in terms of the theoretical parallel-forms coefficient. 

The Kuder-Richardson formulas are similarly inadequate for speeded 
tests. In speeded tests, where discrimination among examinees is made in 
terms of the differential number of items answered in a specified length of 
time, the inter-item covariances within a test are higher than they would be 
between parallel items on different forms of the test (7, 588). Since the 
reliability of the total test is a direct function of the reliabilities of the indi- 
vidual items (measured in this case in terms of inter-item correlations), the 
value of the reliability coefficient for the total test is thereby inflated. 

In view of the inadequacies of the reliability formulas, it appears that 
there are at present no single-administration techniques for estimating the 
reliability of speeded tests. Guttman (5), in fact, maintains that reliability 
in general cannot be estimated from a single trial, and that all single-trial 
reliabilities are, in effect, lower bounds. Cronbach and Warrington (2) and 
Gulliksen (4) have developed lower-bound estimates of the reliability of 
speeded tests, but precise single-administration techniques are not available, 
Guilford (3, 486, 487) suggests the application of a split-half technique 
in which both test halves are given in separately timed administrations in 
immediate succession. (One of the difficulties of this method that first comes 
to mind is the matter of deciding on the appropriate time limits for the 
separate halves which would match the degree of speededness of the total test 
given in one administration.) The only alternative method is to devise an 
additional full-length parallel speeded test and to obtain an equivalent-form 
correlation. This procedure raises at least two problems: The first, dis- 
cussed by Cronbach and Warrington, is the expense of constructing an 
alternate form solely for the purpose of providing reliability coefficients for 
a published test. The second problem relates to the questionable assump- 
tion that the parallel test is truly of the same effective length as the original 
test, merely because the numbers of items and the scheduled test times for 
the two tests are equal. In the case of speeded tests, variations in the amounts 
of time necessary to answer the items will cause substantial variations in the 
effective lengths of the tests. 


The purpose of the present paper is first to suggest that the problem of " 


economy in obtaining the reliability of a speeded test may he at least partly 
solved by administering a short parallel form in addition to the regular test. 


WILLIAM H. ANGOFF 8 


Second, the purpose is to provide a measure of functional or effective test 
length and incorporate that measure into the reliability coefficient. In 
a later section of this paper, corresponding methods will be discussed for 
computing the reliability of unspeeded tests where the short parallel form is 
imbedded within the regular test, and only one administration of the test 
is given. In the latter case, the reliability is probably better interpreted 
as а lower-bound reliability or an index of internal consistency. 

Case I. The determination of the reliability, r,, , of test ¢ from the 
correlation between test / and test 7, a separately timed test, parallel to test 
i. While the more stringent case of speeded tests is treated, the method 
applies equally well to the case of unspeeded tests. 

We shall consider that a short test, 2, has been devised to parallel in 
function, level and spread of item difficulty, and items per unit of time a 
long test, /, which is speeded and for which a test reliability is to be deter- 
mined. In connection with the requirement of parallelism it is assumed that 
the tests have been equated for spuriousness, in the sense that Cronbach and 
Warrington (2, 169) have used the term. In their paper they point out 
that in an unspeeded test “especial difficulty on one of the items neither 
increases nor decreases the person’s probable standing on the remainder. But 
in a timed test, the person who gets stuck on one item may never reach the 
remainder of the items. It is this interdependence of items that introduces 

| spuriousness." Finally, it is considered that, contained in test t there are n 
tests, 7, of effective length 7, all parallel to test 7. The correlation between 


tests 7 and ¢ is given by: 


Tja = Тг(г.+ть+...+ту+тк+...+тв) 
a 
ба и 
_ = = "Са А (1) 
9 0,01 Oif 


where Су; is the sum of the covariances 70:0; between test 1 and each of 
the parallel forms j of effective length 4 contained in test (. The value of n 
is the number of tests of effective length i contained in £, or the ratio of effec- 


tive lengths, ¢ to 7. 5 Р 
The variance of test t may be written: 
wen n ^o, = п nh — D С, (G = k). (2) 
с; > в; + > = = 

е t of the formulations to follow. 
P t the developmen 5 5 2 
а of tt hm б e тшше tn p ii Y 
equivalent len th are equal, so that би = Са ; (b) any пе ч (о: cl 
^ance) is equal ги the average of all other variances (ог covariances) involving 
parallel tests of equivalent length, so that о; = 97, Са = Са , and that 

Cy, та. ба s ; 


4 PSYCHOMETRIKA 


Then, solving (1) for Ca; , substituting in (2) for Cy, , replacing cj by 
ci , and solving (2) for п, we have 


ACA + тие) 
а= СЕСЕ E: Та) : @) 

Equation (3) yields a value of n which is determined not from the 
arbitrary ratio of the numbers of items in the two tests or from the ratio of 
time lengths, but from the data yielded by the test experiment itself. Par- 
ticularly in speeded tests, neither the ratio of time lengths nor the ratio of 
numbers of items is suitable for estimating effective n. For one thing, as- 
suming that no one completes the test, and that speed is the primary source 
of test variance, the distribution of test scores is highly sensitive to changes 
in total time limit as well as to changes in spuriousness (see above), but 
not at all sensitive to the addition of test items. Secondly, extraneous 
factors such as the period of warm-up at the beginning of the test would 
operate to reduce the effective test time in the short test to a greater extent, 
proportionally, than in the long test. Consequently, it would seem appropri- 
ate that a measure of effective test length be used in estimating reliability, 
such as that expressed in equation (3) rather than the ratio of the numbers 
of test items or the ratio of test times. 

It may be of some interest to note that if та, = 1.00, then n = c/c; , 
and that if r;, = .00, then n = 0*/о*. Consequently, we can establish that 


i/o; > n > c/o. 1 may also be observed from equation (3) that if the 


standard deviations of the tests are equal, then n = 1, and the tests are of 
equivalent length. 


It will be convenient at this point to state the reliability of test t in 
terms of its correlation with test 7. Consider that test ¢ is correlated with a 
parallel test of equivalent length, composed of т tests of length i: 


STU ыс 


_ _isl n C; 

Tit. = Vectzatzst+..0tzits..¢za) == a == са , (4) 
where X` Ca = Ули. In accordance with the assumption of equal 
average covariances stated above, Ci = С... Thus, 

p, = "Си _ ис; 
tt ei E (5) 


Substituting (3) in (5), we find 


с: + Tac, ` (6) 


Tu 


WILLIAM H. ANGOFF 5 


Equation (0) gives а method for determining the reliability of a test 
from its correlation with a parallel test, not necessarily of the same length. 
In examining the practicability of equation (6) it is observed that this is 
the formula to be used when estimating the reliability of a test from the 
correlation of any two parallel tests, even those presumed to be of the same 
effective length. If the standard deviations of the two tests are equal, and 
the tests are of equivalent length, then the reliability, тг, , is identical to 
ти , the correlation between the two tests. However, if the standard devia- 
tions are unequal, and the tests are incorrectly presumed to be of equivalent 
length, then the correlation between the two tests will be different from the 
reliability of either testa In effect, the value of n must be considered and 
incorporated into the determination of reliability, and it would be necessary 
to decide beforehand whether the reliability of test # is to be determined, or 
the reliability of test 7. If the standard deviations of the two tests are differ- 
ent, then different results will be found. 

Particular emphasis should be given to the basic assumption inherent 
in the present formulations: tests i and t must be parallel tests. If that as- 
sumption is violated in a choice of a non-parallel test 7, then the reliability 


of test / may well be grossly underestimated. | 
Finally, it may be observed that if (1) is substituted in (5), 


2 C; 
та = аи 5 (7) 


If it is assumed that Си = 7:0, then 
а Vra Я (8) 


у = — = 
: s; Vir 


The value of n.is seen to be the ratio of the standard deviations of true scores 
in the (mutually exclusive) long and short tests. 

Case II. The determination of the reliability, ти › of an unspeeded test, 
1, from the correlation between test ¢ and а subset of items, test 7, included 


in ti 

i: = { 18 not speeded and the principal source Ped Sce Jes ш 
the differential abilities of the examinees to respond correctly VIE ена, 
then a single test administration is capable of yielding Ln internal iconsistendy 
reliability coefficient. Let us consider that there exists and can be chosen 
a subset of items, test j, contained in test t, that parallel the parent test in 
function and difficulty. Further, consider that there are n such parallel 
sübtests contained an test в АЙ mutually exclusive. Then, making use of 
the same assumptions of equivalence as were made for equation (1) above,* 


*Except for equating the characteristic of spuriousness. See statement of Cronbach 


, and Warrington quoted above. 


6 PSYCHOMETRIKA 


we can state the correlation between test j and its parent test, t, as follows: 


Tj, = Т, 


ziGa*...rjkrbk..s bra) 
Р 

ei-b Уба 
i=l 

7;0, 


"Cra mr (9) 
туо, 


| 


(7 = 1) 


We have observed that 


ei = по? +t пп — 1) Cn. G k) (2) 
If we now solve (9) for Cj, » Substitute in (2), and replace о] by its 
equivalent, o? , 


о, 


т = =. (10) 
Тит; 
Now solving equation (2) for C, and substituting in (7) for its equivalent, 


17) 


(n — l)o; ? ay 
which is exactly parallel to Kuder 


and Richardson’s formula (20). Finally, 
if 07 is substituted for its e 


quivalent, сї , and the value found in (10) is 


sub- 
stituted for n, 
“ти — gj 
GS М, 1 
"а Tilo, — Тит) 2) 


Equation (12) gives the reliability of an unspeeded t 
the correlation between ¢ and its parallel subtest 7, 
tions. 


est, £ obtained from 
and their standard devia- 


Case III. The determination of the reliability, Tre, of an unspeeded test, 
1, from the correlation between its complementary parallel parts, and 3, 
and their standard deviations. 


It will be observed that equation (12) may be written as follows: 


Tiu- 
== 20003) 


Та = Е $ 
Tii (aj) (13) 
Ifh=t- % 
Thi Thij 
SS 
тт, Tiin hengi) > (14) 
апа 
2 
Nie 
fu же а. VAIO — 


(о, T Tu9;)(s; + туо) ` (15) 


WILLIAM H. ANGOFF 7 


The value of c? may be taken from the following expression, 
сі = о + 9; T 27:00; , 
and substituted in (15) to yield 


E ios Tailor ae в} + гета) 
Tn = Coy + nue) ri + nue) ' = 


so that all values used are taken from the subtest scores. 
It may be noted from (16) that if the test split has been made in such 


а way as to produce parallel tests of equal effective length—that is, when h 


and j are equivalent tests and o, — c; , then 
2n; _ 


ти = 
ы 1 + ' 


an-Brown correction for half length. 

To complete the analogy between i-exclusive-of-t and j-conta?ned-in-t: 
It is clear that the counterpart, for the “contained” case, of equation (8) 
(where n is expressed as the ratio of the standard deviations of true scores, 
t to i), is directly analogous to the “exclusive” case. If it is assumed that 
Tijo? = raoi , then equation (8) may be restated: 


which is the familiar Spearm 


nm EOS a7) 
э c; Vii 


In the case of power tests, test length has usually been measured in 
terms of the number of items. However, if the items near the beginning of 
the test are correetly answered by everyone in the group, or if the items near 
the end of the test are correctly answered by no one in the group, then the 
test is obvious'y not effectively of the length arbitrarily assumed. Some 
measure of test length should be used such as that implied in (17), which 
takes into account the number of items effectively discriminating among 
the members of the tested group. 


It may be argued that if the short test is ideally chosen with respect to 


level and range of item difficulty, then the value of т will remain constant, 


irrespective of the performance of the particular group. However, since the 
ideal is not achieved in practice, it is necessary to determine the value of n 
in the particular instance. In effect, the direct determination of effective n 
allows a greater degree of Jaxity in the choice of items for the subtest, but 
does become a necessary adjunct to the determination of reliability. Particu- 
larly important is the fact that the choice of items for the subtest need not 
be restricted by any arbitrary prior decision regarding its length, since its 
length would be determined in conjunction with the determination of the 
reliability. With that restriction removed, greater freedom can be devoted 


8 PSYCHOMETRIKA 


to making the subtest truly parallel in function and distribution of item 
difficulty. 

It may be well to repeat that throughout these formulations it is assumed 
that the subtest of items, 7, is parallel to the long test, t. If this assumption 
is not met in practice, then the reliability of test # will be underestimated. 

It will be of some interest to examine the relationship among 7,,, 7 
and r;, , where test 2 is exclusive of ¢, and to compare that relationship with 
that found among Tee ,7;; , and 7;, , Where j is contained in t. If we consider 
the "exclusive" case first, we note in equation (1) that 

"n C 
o (1) 


If we assume that Ci; = roi , and substitute in (1) the у 
(8), then 


"у 


Ти = 
alue of n found in 


rie = Ти s (18) 
Equation (18) has otherwise been obtained by stating the correlation between 
parallel forms of the same test, adjust 


orm; ed for attenuation due to unreliability 
and considering that the correlation between true scores on parallel tests is 
equal to unity. 


Considering the "contained" case, we note in equation (10) that 


TE NR M (10) 


Тус; Ж 
If the value of л found in equation (17) is substituted in (1 0), it is found that 
E Ре, 
gs Та 
Tt is observed in comparing (18) with (19) that the relationship among 
Tt s Tii , and ra in the “exclusive” case is quite different from the relationship 
among т, , та , and ти in the “contained” case. When the short test is 
exclusive of ¢, then 72, is equal to the product of the reliabilities of the short 
and long tests (equation 18); when the short test is contained in ¢, then 
73, 18 equal to the ratio of the reliabilities of the short and long tests (equation 
19). 


(19) 


It can be shown that equations (18) and (19) are not inconsistent, if 
account is taken of the spuriousness in (19). Since tests i and j are parallel 
and of equal length, assume that r;; = у, 


{+ Solving (18) for r and sub- 
stituting in (19), and also substituting r,, for т, ; EPOR TIN 


iy 


Ese, Te 

Fig Tiu у (20) 
Now substituting in (20) the values of та, and 72 found respecti Iyin {1 
and (10), and assuming that Са =" Е cd Бена 


80:03, equation (20) results in an identity. 


WILLIAM H. ANGOFF 9 


It may be of interest to examine further the relationship between ту, 
and т. If tests ¿ and j are parallel and of equivalent length, as has been 
assumed throughout this development, it would appear obvious that т, > 
T, , because of the spuriousness in 7;, . The degree of this spuriousness can be 


shown in that ти = Мт: + k, where 
pum {1 — ri?) - d cms Р 
с: Mn + па — Dri; 


Assuming that the reliabilities 7; and r;; are equal, then it is clear that 
= wr, cannot hold unless 7;; 


ти > ти , and that the relationship Ти = 


= 1.00. 
Consider that the correlation between 


што is unity. Then 


true scores on j and ¢ (j included 


= 1.00 = 7.;-е (е0) 
_ Боља; Dre- Уе Xen (21) 
ra Моја. Vr; Та : 


where, for example, jo and e; are the true and error components of z; , such 


that 2; = jo Ве. 

Examining each term separat 
zt 

isis эле, 


ely, we find 


У 


Боља Уа а tet te LETE 4. 
EVI N N ei? 


ет у E 


| 
NS 


Yee; a sk n boim Ей Бинт ctt _ а о 
а оды м i * 
Other terms go to zero. 
Then 
75.930: — бе: 
= a >; 22 

1.00 x Ata (22) 

and М 
(р-ты). 

and finally, 


(23) 


Y metiri 
г ЛӘН ЕЕЕ 
9 = Мт + n+ n(n — Dri 


10 PSYCHOMETRIKA 


It has been observed that equation (18) holds only when the short and 
long tests are mutually exclusive, and that (19) is applicable when the short 
test is contained in the long test. The use of (18) when (19) is justified may 
well lead to questionable results. For example, it appears that in their Case 
II development, Kuder and Richardson (6) made inappropriate use of equa- 
tion (18), since they were dealing with the correlation between an item and 
the test in which it was contained.* If their equation (3) is restated in the 
present notation (and renumbered a), 


dap 2 тура 


Ta = 7 (a) 
t 
and те rj, is substituted for r;; instead of пути , Which they used, then 
a X pg У морд 
fap = | ЕЈ А * (b) 
[Ln 


Solving for r,, , we find 


Ты = 5 ; (с) 
2 2 


instead of the equation (8) presented in their article, 


8i — vg " АРТ 1 (2 =% 24). 
с; c, 


2 
20. 


Та 


A second instance in the Kuder-Richardson article of the inappropriate 
use of equation (18) appears in their Case III development, in the step from 
their equation (9) to equation (10). They assumed that r;; r, = т, , which 
does not hold if item j is contained in test 1. If their equation (9) is restated 
in the present notation (and renumbered d), 


ma (>: Vom) 
2 QURE ME (d) 
and та 75, is correctly substituted for ту; , then r, disappears entirely, and 


n 


«SX Vom), © 


i-i 


*Tt should be pointed out that the relationship between r;; s Tet, and гу, which was 
used in the Kuder-Richardson article is not basic to the derivation of t| 


DD heir formulas (20 
and (21). There is no implication in the present paper that those formulas need mue 


^^ _ 


WILLIAM H. ANGOFF 11 


instead of their equation (10), 


(X vae) 


j=l 


о: 


Actually, the amount of error incurred when (18) is used instead of 
(19) is quite small, if n is large. In general, this is true in the Kuder-Richard- 
son developments described above, where а single item is correlated with 
the entire test. If, for example, т is taken as 100 and ту; (or 7;;) is taken , 
as .10, then т, = .303 and та = .330, а difference of only .027 between the 
spurious and the non-spurious correlations. Similarly, the error in fs is 
small. If the spurious value, 7: ; is used in (18), then га is found to be 
.930 instead of .917 if (19) is (properly) used. If, on the other hand, test t 
is effectively not much longer than test j, then the difference in correlations 
can be quite appreciable. Suppose, for example, n = 4 and та (or rjj) = 
50. Then r; = .688 and ти = 791. If the spurious correlation, Га , is 
improperly applied in equation (18), then ту, will appear to be much higher 
than it should—.90 instead of .80. 

Tables 1 and 2 describe the results of some computations which serve to* 
illustrate the usefulness of the estimates made in equations (6) and (12). 
Table 1 relates to the reliability of speeded tests, and Table 2 to power tests. 
In Table 1, the results are presented for four replications (with variations) 
of an experiment in which four randomly chosen groups of male college-level 
students were administered separately-timed speeded tests in mathematics. 
Test 1 contained 16 items in frec-answer and multiple-choice form, alter- 
nately presented. Tests 2 and 3 were each composed of two tests—37 free- 
answer items and 37 multiple-choice items—each separately timed. Thus, 
scores on Test 1 were derived from 16 items, while scores on Tests 2 and 3 
were each derived from 74 items; all three tests contained both free-answer 
and multiple-choice items.* . М А 

For each of the four groups of students, Table 1 gives three estimates of 


reliability of Test 2, and three for Test 3. The value "а may be ошаш, 
as it has in the past, to be the reliability of either test, with the (unw arranted) 
presumption that the two tests are of equivalent length. In addition, Noo 18 
estimated from its correlation with the shorter test, 1, adjusted for test 
length, and also from its correlation with Test 3, also adjusted for differences 
in test length. Similarly, the reliability of Test 3 18 estimated from its corre- 
lation with Test 1 and also from its correlation with Test 2. It is seen that 
the estimates of reliability for Test 9, as derived from equation (6), are close, 
as are the estimates of reliability for Test 3. The largest difference, .027, is 
that between the two estimates for та (Group C). Even this difference may 


preciation to Dr. L. B. Plumlee for providing 


"The author wishes to express his ap 
the data summarized in Table 1. 


$0°F £0°F 
AUF 80°F 
S0' T СОР 
30°? 76`8 
LU 88 Р 
AU? 10% 

5 г) 


996" 
съб’ 
216° 
216° 
216° 
216° 


A56. 
0Р6" 
926" 
£26" 
#26" 
896" 


TOS TG 
911'15 
959'28 
SES" Zs 
GES" SE 
S8£9'68 


189'9 00€ о &-NA 
189`9 008 a с-МА 
с68'8 00€ у 0-NA 
$898 00€ y 0-NA 
972", 008 у 0-NA 
2888 00€ Y 0-NA 


(05) arx (ст) "bar 


5шә}[ Jo s1oquin yN 


вцүЗцә әлдә 


na 


7 389], зор, £4sayqng 
— | пож) 991, 
SuonviAo(q prupuvjg 


10 sou Jo sony d 
Е П 980) :5}591, popoodsu[) лоу зцазиот о, PANOYA JO SONTY рив sonimquipoyr 
а $ Я" 
8 5 Тау, 
o 
| 
o 
я 
B 
86° 96° оде оде 2928" 698" 628° 268 LB 58° ZBL’ | 12601 OLL'OI 8958 GET а 
SUI ZUI 268 178 FIG 288 306° 288 606° 692" 911 80601 2986 2268 GET o 
001 ТОТ  Á 40$ 807 988° 988 ogg 88° 988 192 85°  g99r'Or 901'01 206'5 GET a 
Ут 188 198 206° — 288 908' — S/8' — 008^ ва — 19/5 РОТ 698`0т ZEE лет F 
тате gig те re a ы ы 54 кыш ety En #3521 ZIL TIL 
№ пошу 
5418097] 9апоо у Jo зоба шолу ISH #4 полу *jsrp 224 5101591100) SUOLAA prupuvjs 


1958) :53391, popoodg лој 81)3uorp 959.1, әлә: 


12 


І GISVIL 


OYT Jo SOVY рив sorquimojr 


Y 


A 


WILLIAM H. ANGOFF 18 


be accounted for in part by the lack of complete parallelism between Tests 
2 and 3 and Test 1. (It is recalled that Tests 2 and 3 had two time limits, 
while Test 1 had only one.) 

The right-hand side of Table 1 describes the ratios of effective test 
lengths, as determined from equation (3). In general, Tests 2 and 3 appear 
to be 3.5 to 4.1 times as long as Test 1, in spite of the fact that they contain 
about 4.6 times as many items as Test 1. It is also observed that in Groups A 
and C Test 3 appears to be effectively longer than Test 2, which accounts for 
the slightly higher estimates of reliability for Test 3. In Group B, the lengths 
are seen to be about equal, while in Group D, Test 3 is the shorter of the 
two. Finally, in the last two columns it is observed that the two independent 
estimates of the ratio of effective length, Test 3 to Test 2, are extremely close. 

Table 2 relates to the reliability of unspeeded tests. Three forms ofa 
150-item verbal and numerical reasoning test were administered—in the 
case of VN —0, to an extremely heterogeneous group of over 2000 examinees, 
and in the case of VN —2 and УМ—8, to larger, but more homogeneous 
groups of examinees. In the case of each test, a sample of 500 cases was 
drawn at random from the parent group of examinees. For Test УМ—0, 
four mutually exclusive subtests were chosen, each composed of slightly less 
than one-fourth of the total number of items in the parent test. Scores on 
the subtests were then correlated with scores on the total test, yielding the 
values in the column headed r;,. For Tests yN-2 and VN —8, one sub- 
test was chosen for each and correlated with its parent test. Estimates of 
reliability were then made in each instance in accordance with equation a2) 
and also in accordance with Kuder and Richardson s formula (20). { It is 
seen that in each case the two estimates of reliability are close, differing at 
most by .009. Itisalso observed that the ratios of effective lengths determine 
from equation (10) are similar to the ratios of the numbers of items in the 


buic ада ом f equivalent methods become available for the 


I ry, a number О : а ioe 
Sie "iue reliability of a whole test without making any arbitrary 
авиша of the relative lengths of the subtest and total test. The formulas 


presented here in the case of both speeded and power ош и only 
of the general assumptions that the short test or Беи e y represen- 
tative and parallel miniature of the long test, in terms = nom content and 
level and spread of item difficulty. The amount ci уап t E cor relation 
between the short (or subtest) and long test is to be “stepped up” is de- 


e 2 CUM in the equations. 
termined and incorporated in ld be tied d with speeded tests where the 


ас ion (6) 8 led tests w 
pol puse e na test is required to determine reliability. The 
equivalent alternative is to use equation (5) in р with (3). In the 

У У 51 е: 

E. гет eneral procedures are possible 
se rw I es with mutually exclusive but parallel tests, 
in which reliability is to be determined from the correlation between the 


14 PSYCHOMETRIKA 


parallel tests. In this case as in the case of the speeded tests, equation (6) 
would be used, or (5) in conjunction with (3). 

(2) А single administration in which reliability is to be determined 
by breaking off а subtest of items parallel to the total test, and correlating 
the subtest score with the total test score. In this ease equation (12) seems 
to be practicable, since it involves only one correlation and its by-product 
standard deviations. The equivalent alternative to (12) is equation (14), 
which is concise algebraically, but considerably more laborious. 

(3) A single administration in which reliability is to be determined 
by splitting the total test into two parallel subtests (not necessarily of equal 
length), and correlating the subtests. In this case, equations (15) and (16) 
are appropriate. Formula (14) should also be mentioned here, and is ap- 
propriate, except for the reservation noted in the preceding paragraph. 


REFERENCES 
1. Cronbach, L. J. Test "reliability": its meaning and determination. Psychometrika, 
1947, 12, 1-16. 
2. Cronbach, L. J., and Warrington, W. С. Time-limit tests: estimating their reliability 
and degree of speeding. Psychometrika, 1951, 16, 167-188. 


Guilford, J. P. Fundamental statistics in psychology and education (2nd ed.). New 
York: McGraw-Hill Book Co., 1950. 


4. Gulliksen, H. The reliability of speeded tests. Psychometrika, 1950, 15, 259-269. 
5. 


Guttman, L. A basis for estimating test-retest reliability. Psychometrika, 1945, 10, 
255-282. 


6. Kuder, С. F., and Richardson, M. W. ‘The theory and estimation of test reliability. 
Psychometrika, 1937, 2, 151-160. 

7. Thorndike, R. L. Reliability. In Lindquist, E. F., Educational measurement. 
Washington, D. C.: American Council on Education, 1951. 


Manuscript received 5/19/52 


Revised manuscript received 6/28/52 


PSYCHOMETRIKA—VOL. 18, No. 1 
MARCH, 1953 


A LEAST-SQUARES SOLUTION FOR CASE IV OF THE LAW OF 
COMPARATIVE JUDGMENT 


У. A. GIBSON 


UNIVERSITY OF NORTH CAROLINA 


Case IV of Thurstone's Law of Comparative Judgment is displayed 
as а system of homogeneous, linear equetions for which a least-squares solu- 
tion is presented, using various con itional equations which fix the origin 


and the unit of measurement. The computational load is, however, quite 


heavy. 

ast-squares solution for Case IV of Thurstone’s 
Law of Comparative Judgment (1, 281-282). The aim is merely to show that 
such a solution exists, rather than seriously to suggest its employment in all 


practical applications of the method of paired comparisons. 
Case IV of the Law of Comparative Judgment is as follows (1, 281): 


This paper will present a le 


1 = 
E = в = vatum + о), (1) 
where 
S; = scale value or average perceived position of stimulus j on the 


psychological continuum, 
S, = scale value or average perceive 
psychological continuum, 
c; = discriminal dispersion or s 
of perceived positions of stimulus j О 


d position of stimulus Ё on the 


tandard deviation of the distribution 
n the psychological con- 


tinuum, 

v, = discriminal dispersion or standard deviation of the distribution 
of perceived positions of stimulus b on the psychological con- 
tinuum, 

Xj, = normal deviate corresponding to the 


judgment j > k. 


proportion of empirical 


The assumptions involved in this statement of the law are (a) that there exists 
à unidimensional psychological continuum along which the perceptions 
of some attribute of a set of stimuli can be located; (b) that the distribution 
of perceptions of each of the stimuli along that psychological continuum is 
normal in form; (c) that the correlation between the paired perceptions of 
any two stimuli is zero; and (d) that the discriminal dispersions are all of the 
same order of magnitude (1, 273-281). 
15 


16 PSYCHOMETRIKA 


By a consideration of the analytie geometry involved in the plots of 
columns of the square table of X values, Thurstone has worked out an in- 
genious and rapid approximate solution for the 5 and с values under Case 
IV (2, 293-296). The solution to be presented here will have the advantage 
of providing the best-fitting S and е values for a set of paired-comparison data, 
but it will have the serious limitation of increasingly prohibitive computational 
labor as the number of stimuli becomes large. 

We may write equation (1) explicitly for several of the stimuli as follows: 


Ed = Vi Ха + ог ), 

Sy = 58, = Vi Хи(а + оз ), 

8, == == Хи(п +5) 

5, — 8S, = a Xa( с + оз ), 

8, -—— Wi C MEE | 
Евы ь саг, (12) 


RT 


Equations (1а) constitute a set of six linearly independent homogeneous 
linear equations in eight unknowns. Since the zero point and the unit of 
measurement are arbitrary matters (cf. 1, 281), we may obtain a unique, 
non-trivial solution in the case of four stimuli by specifying the origin and 
the unit of measurement by conditional equations such as the following: 


S, = 0, (2) 
and 

о = 1. (3) 
Substituting equations (2) and (3) into equations (1a) and transposing, we 
get the following set of six linear equations in six unknowns: 


1 је 
Sı — 5, Хө = пр Ане: = 0), 
В Е а 
S Augen — 3/5 Ха = 0, 
1 
S, = Хат aor 


14 5 


5 
| 
3 


= 


W. A. GIBSON 17 


1 = 1 
Sa — № CAR Ac у на, 
2 LL. ПД zs 
S. BE ue с ey бо 
l 1 
S; = và Xa оз = va . (9 


: Equations (4) may be solved by any of the methods of solution for г 
independent linear equations in 7 unknowns. Thus a unique solution, except 
for the origin and the unit of measurement, is possible with four stimuli, 
while an overdetermined solution will be available for more than four stimuli. 
For each stimulus, k, that is added, there will be added (k — 1) new equations, 
while only two new unknowns, S; and c; , are introduced. 


Equations (4) may be stated in matrix form as follows: 
AB = C, (5) 
Where the coefficient matrix А, the matrix of unknowns В, and the matrix of 
constant terms C are defined below for the case of four stimuli: 


A B C 
lm 
1—1 - Ла = уда Sı 0 
5 oe 
1 -1- 9 де 5, 9 
1 xe 
cp & | age 
Р 1 xx 
lel - Fin - уд» т 0 
d. ch су 
1 = 92 wer? 
1 
1 - LS [E] Pr e 


Since A is square when only four stimuli are involved, the unique solution for 


equation (5) in that case is simply 
В = АС. (6) 
quation (5) can be obtained by 


For more than four stimuli, à solution for e АО. : 
premultiplying that equation through by (4 A) 'A', leaving 


B = (4'4) ^ A'C. (7) 


28, 


18 PSYCHOMETRIKA 


Equation (7) can be shown (cf. 3, 173) to be a least-squares solution 
in the sense that the S's and o’s it yields are such as to minimize the sum of 
the squared discrepancies of the entries in the matrix product AB from the 
corresponding entries in C. 

A slightly different approach which will have a greater degree of symme- 
try and will involve more of the unknowns in each of the observation equations 
is to replace the conditional equations (2) and (3) by the following ones: 


N 
5; S; = 0, (8) 
and 
N 
> e; = М. (9) 
Equation (9) may be multiplied through by 1/+/2X;, to give 
i xay 1 NX 10) 
E g; М XR. 
у ye s | 


Adding equation (8) and the appropriate equation (10) to each of equations 
(1) and transposing, we get, for five stimuli, 


1 


+ &+ OSERETAX4 + atat o) o DAN Xa 
BS S, + Si + 8 + Te ( 2s tata) = UN Xs 
25, + $ + 8, + S+ Te Xu са + os +) = TaN Xu 
25 + + S+ S, + Хх=‹ тз + оз + о, )- EN Хаи 
S, 4- 28, = S + 8, + re Xa (а tube = ЕМ Хи 
8 +28, + 5, + Ss + Da Хи (а + «а +) = TaN Xu 
S, +25, + S,4 4, tUe оф m ) = + Хи 
Si + S, + 25, + 85+ Te Xu n + а, +o) = Son Xu 
5 + 8, +28, + 8, +з Х» (or + 02 Фа LEES 
Lb B+ E35 + Xu lor + + TON ET 


"an 


W. А. GIBSON 


19 


Each of equations (11) contains 2V — 3 unknowns, while some of equa- 
tions (4) contain 4 unknowns, others 2. In all, there are 2V unknowns in 
equations (11), while there are 2N — 2 unknowns in equations (4). Thus 
the minimum number of stimuli for a solution by equations (11) is five, in 
Which case there will be ten equations and ten unknowns. 


We may state equations (11) in matrix form as follows: 


EF = 6, (12) 
Where the three matrices are as shown below for five stimuli: 
E F G 
E ake vade am] || | дих 
Zo a Wi am vade Vas Bs Vg Xs 
An d P LO P LT s 5, PUES 
7473 d up NL 8, PLE 
x MEDIE = Xa ° EU. = vi Xa o 18 vi N Ха 
MANN ч e Xn s NES b vas X 
к un vg vae e. ВУ Xu 
Bie “a? seg 2, ii à Xx J ре d N Xa 
тв Хи dis ND “| [RN Xe 
EX ovd == X, m E. 25 Xa = i N Xa 


The greater symmetry of these three matrices is apparent. 
For five stimuli the solution of equation (12) is simply 


20 PSYCHOMETRIKA 


while for more than five stimuli the least-squares solution of equation (12) is 
F = (Е'Е) '1'0. (19 


The matrix products E'E and E'G, which are needed in this solution, 
have some interesting properties that will not, however, be discussed further 
here. Suffice it to say that certain sections of those matrix products could 
more easily be formed by the application of simple rules than by actually 
carrying out the matrix multiplications. These rules are readily inferred 
once the matrix multiplications have been carried out symbolically for several 
different values of N. The same kind of regularity holds for the matrix 
products A’A and A’C of equation (7). 

A slight variant of the solution expressed in equation (14) would be to 
divide each of equations (11) through by the corresponding X ;, before forming 
the three matrices involved in the solution. This would have the disadvan- 
tage of complicating the formation of the coefficient matrix, but it would 
have the advantage of giving each of the observation equations equal im- 
portance in the minimizing process. By contrast, consider the effect of 
multiplying, let us say, the first of equations (11) through by 100 before 
getting the least-squares solution for a problem involving more than five 
stimuli. The resulting solution would then be such as to give a very good fit 


for the initial first equation, with relatively little attention having been given _ 


to the agreement between the right and left members of the other observation 
equations when the S and c values that have been found are substituted into 
them. 

It would of course be possible to obtain a least-squares solution for 
equations (1) by using still other combinations of conditional equations for 
the purpose of fixing the origin and the unit of measurement. For example, 
equations (2) and (9), or (8) and (3), or any other pair of analogous equations 
could be used at the discretion of the investigator. 3 

In practical problems with the method of paired comparisons it is often 
necessary to eliminate certain of the linear observation equations because of 
the instability (when p;, is greater than .95 or less than .05) or indeterminancy 
(when p;, is equal to 1 or 0) of the X values involved. Such a reduction of the 
number of rows in Ё and С would, of course, destroy the complete lexical 
order of the equations and the Simple rules of formation of the matrix products. 
One way of getting around this complication would be to work with several 
overlapping sub-sets of stimuli, for each of which all X values are reasonably 
stable, and then to combine the resulting overlapping sets of scale values 
into a single composite set by some graphical fitting procedure. This same 
approach could be used to lighten the computational load even when all X 
values are useable, although it and other such labor-saving reductions would 
work against the primary purpose of a least-squares solution. The limiting 
case of such an elimination would be to retain, on the basis of appropriate 


s иди 


W. А. GIBSON 91 


reliability considerations, only а “best” set of 2N equations containing all 
of the unknowns. The matrix Æ would then be a square matrix, and hence the 
unique solution for equation (12) would be simply equation (13). А still 
shorter solution, if such a complete reduction is contemplated, would be by 
means of equation (6), for a square matrix A is of order two less than a square 
matrix Æ, so that equation (6) involves 2У — 2 rather than 2N unknowns, 
2N — 2 rather than 2N observation equations, and the computation of an in- 
verse of order 2N — 2 rather than 2N. These complete elimination pro- 
cedures have little to recommend them, however, for the computational load 
will still be so heavy that it ean hardly be regarded as worth while if it does 
not lead to some kind of best fit. 

Of course the main practical disadvantage of the solution described here 
is the computational labor involved in calculating the inverse of a matrix of 
order 2N, 2N — 1 or 2N — 2, even for the smallest possible N. Accordingly, 
this solution is perhaps, for the present at least, more of theoretical than of 


practical interest. 


REFERENCES 


A law of comparative judgment. Psychol. Rev., 1927, 34, 273-286. 


1. Thurstone, L. L. 7, 34, 
dispersions in the method of constant stimuli. J. exp. 


2. Thurstone, L. L. Stimulus 


Psychol., 1931, 15, 284-297. | . и 
‚ 3. Turnbull, Н. W., and Aitken, A. C. An introduction to the theory of canonical 


matrices. London: Blackie and Son, Limited, 1932. 


n 
Manuscript received 2/18/52 


Revised manuscript received 7/10/52 


| тз. +; Дл 
| a a "d 4 Zi 
Ра 
T | a. 5 
UT» 1 | 
| | U E ће 
y 
| “44. 


uw 


PSYCHOMETRIKA—VOL. 18, No. 1 
MARCH, 1953 


AN ANALYTICAL SOLUTION FOR APPROXIMATING 
SIMPLE STRUCTURE IN FACTOR ANALYSIS 


Joux B. CARROLL 
HARVARD UNIVERSITY 


It is proposed that a satisfaetory criterion for an approximation to 
the minimization of the sums of cross-products (across 
f factor loadings. This criterion is completely analytical 
lution; it requires no plotting, nor any decisions as (0 the 


clustering of variables into subgroups. The equations involved appear to be 
four factors the 


“reanalyzed. The presence of 
А yperplanar fit, which the investi- 
desire to adjust by graphical rotations; the smaller the number of 


gator may 
such tests, 


A criticism of current practice in multiple-factor analysis is that the 
transformation of thejinitial factor matrix F to a rotated, "simple structure” 
matrix V must apparently be accomplished by methods which allow con- 
ibjective judgment. There has been much discussion 
g a unique solution ean be achieved under 
answer is that highly similar solutions can 
be reached by two analysts working independently, provided that they 
follow the same set of principles. This, of course, is not an entirely satis- 
faetory answer. Graphical rotation unfortunately partakes more of art than 
of science. The few efforts to reduce subjectivity in rotation to simple structure 
have not been completely suecessful. Horst's methods (1) depend upon 
subjective decisions regarding the sub-grouping of variables, and Tucker’s 
semi-analytical method (4) involves the use of graphical methods as an 


aid in selecting such sub-groups. | 
It is the purpose of this paper to present à method for approximating 


simple structure which completely avoids subjective decisions. The possibility 
of alternative criteria must be admitted, but the present method appears 


to lead rather closely to the type of solution which is desired in multiple- 


factor analysis. Most important, it yields а unique solution. The reason for 
emphasizing that the method yields only an approximation to simple structure 


will be discussed later. 


siderable scope for sv 
of whether anything approachin 
these conditions; the customary 


23 


24 PSYCHOMETRIKA 


Thurstone (3, 335) lists the following desirable characteristics of a 
simple structure matrix: 


(1) Each row of the oblique factor matrix V should have at least one 
Zero. 

(2) For each column p of the factor matrix V there should be a distinct 
set of r linearly independent tests whose factor loadings v;, are zero. 

(3) For every pair of columns of V there should be several tests whose 
entries »;, vanish in one column but not in the other. 

(4) For every pair of columns of V, a large proportion of the tests 
should have zero entries in both columns. This applies to factor problems 
with four or five or more common factors. 

(5) For every pair of columns there should preferably be only a small 
number of tests with non-vanishing entries in both columns. 


It is obvious that there could hardly be any single mathematical expression 
Which would embody all these characteristics. If we consider characteristics 
(3), (4), and (5), however, it would seem that some sort of inner-product 
function of the columns of V should be at a minimum if simple structure is 
to be attained. It might occur to one, for example, that the sum of the non- 
diagonal entries in the matrix product V’V should be as close to zero as 
possible. This solution must be ruled out because under orthogonal trans- 
formation of V a zero sum can be achieved merely when positive and negative 
cross-products balance. If we form, from V, a new matrix Т containing the 
squares of the entries in V, we can minimize the sum of the non-diagonal 
entries іп Т’Т and obtain an approximation to simple structure. Such a 
criterion appears to satisfy rather well (within the limitations of a single 
analytical expression) the characteristics of simple structure listed by Thur- 
stone. The similarity to least-squares criteria used in other branches of 
statistics will be evident. 

The following notation will be the basis of the subsequent development: 


j= 1,2, +++, n (tests); 
m, k = 1,2, --+ , s (arbitrary reference factors); 
P, 4 = 1,2, --- , t (rotated factors); s = t; 
F = || ain || = the initial matrix of projections of n test vectors on s 
arbitrary reference factors; 
A = || Am || = the matrix of direction cosines of reference vectors of 
rotated factors; 
V = ||, || = the matrix of projections of test vectors on rotated 
reference vectors; 
T 2 les || || о. || = matrix of squares of entries in V; 
Q PE || м». || = a square symmetric matrix of inner-products of 


columns of T. 


JOHN B. CARROLL 25 


It should be noted that Ё can be any initial orthogonal matrix; it may have 
been obtained by the centroid method, the multiple-group method, or any 
other method producing a matrix such that FF’ yields the correlation matrix 


plus a matrix of residuals. 

The proposed criterion for approximating simple structure is that the 
sum of the non-diagonal elements of 9 be a minimum. That is, if we use 
only the entries on one side of the diagonal, 


i= У) о, = а minimum. (1) 
р<а 


Now, since 
РА = У, (2) 


it is evident that 
Vie = Yin = ( >, а) , (8) 
meal 
and equation (1) becomes 


es ушы = 2; Y у 


р<а р<а i=l 
сены m» 
р<а i71 meal mel 


Multiplying out and appropriately collecting terms, we find that any pa 
(which is, of course, a scalar) can be conveniently expressed by the equation 


wp, = MIAM, , (5) 
where M; is a Tow vector with s(s + 1)/2 entries, that is, 
М; = || Мр, Aa 777 M, Мића 7773 Nghe › NN › 

zs V Карвата КӨ > Agere Ili 


ucted in the same manner аз M, but with à 
change of subscript; and A is à SQUE symmetric matrix of order s(s + 1)/ 2 
which may be specified as equal to =’, where zis ann X s(s + 1)/2 matrix, 
the first s columns of which may be represented as || dim ||. вей the reining 
s(s — 1)/2 columns of which may be represented as || Cainan) ||, 5 > m. 
The vides sf the entries, with respect to combinations of the subscripts m 
and k, matches that in M, . Thus, 
я = || in | (2а;„а) ||. 


M, is а column vector constr 


26 PSYCHOMETRIKA 


For example, if s = ¢ = 3, 


2 2 2 
ал aiz ауз 20101 2a,,013 2a,4013 
2 2 2 

Е = | ал Ojo аз га ла > гала а 2а;2а;з ||, 
2 2 2 
ал аш an3 Зала,» 2a,,0,5 2а,а,з 


and Z/Z becomes 

2373 Ууу Баћа 2ўаһаа 205a; 2 Уа а 
Danan Ола: =“ Saha, Уна 2570505, 2 Уша 

Т 58505; Баћа Sak 225,05, 2 У\ала 2 Зага 
Бајина D> йй» 2 ала,за?, 4 У айп, 4 tants 4 алаа 
27 ша, 2) ananin 2 Ула, 4970545054» гађа, 4 Угалаљаћ 
2725404 2 азаа 22 ава 4P галађам 4 Y йаа?» 4 3705,05, 


where all summations are over 7: 
We can generalize equation (5) and write' 


9 = M'AM, (6) 


where M is a matrix of order [s(s + 1)/2] X t, the successive columns of 
Which are constituted by the column vectors previously represented as M, . 

It would be natural, at this point, to attempt to solve for the desired 
transformation matrix A which would minimize the criterion f. This would 
be done by methods of the calculus. The partial derivative of equation (4) 
for each An» would be determined and set equal to zero, under the restriction 
that 277, А, = 1. The additional restriction could be made, if desired, 
that A'A = I in order to impose orthogonality upon the solution. The equa- 
tions which would result from this process have been found to be complex, 
even for the simple case of two factors, and do not appear readily soluble by 
апу known mathematical methods. 

It therefore seems advisable to minimize the criterion by iterative 
methods, that is, by systematically varying values of Aw» until the function 
f becomes stationary at its minimal value. As will be shown below, this is 
perfectly feasible by ordinary computational techniques when the number 
of factors does not exceed three or four, or perhaps five. For а, larger number 
of factors, high-speed electronie computing equipment may enable a solution, 
but it should be remembered that the matrices which are involved are of 
order s(s + 1)/2. Thus, for ten factors, the criterion is a sum of Scalar products 


JOHN B. CARROLL 27 


of matrices which have as many as 55 rows or columns, or both. Furthermore, 
the number of iterations or trials which may be required will increase tre- 
mendously. The laboriousness of the preparatory caleulations to determine 
the matrix А will depend also on the number of tests, although the number 
of tests does not affect the principal part of the procedure, i.e., the iterations. 

Let us suppose that we are interested in systematically varying the values 
for one column of A, €. А; , While the remaining columns stay constant. 
(This implies that we are allowing an oblique structure; if orthogonality is 
to be maintained, we should have to vary systematically two columns of A 
while the remaining columns stay constant.) The one column which is being 
varied is designated т, while the remaining columns are designated r. Under 
these conditions the value of f changes only as à function of A. , and since 
M, is a function of А, we may write the function to be minimized as 


f= м = ма( > AF (7) 


and defining 


t-1 
Мр = М, 
r-1 
we write equation (7) as 
7. = М:АМ» - (8) 


We will then ђе in à position to make iterations by using trial values 
of A, , determining М! , and postmultiplying by the column vector АМ. 
AM, will, of course, remain constant during the trials for minimizing f. . 
Having found this minimum, we сап choose another column of A for A. , 
recompute AM, using the previous results, and iterate to find a minimal 
value of the new function f. - If we proceed in this way until the function 
has become stationary for every column of A, the solution is eventually 
found. Experience seems to show that the solution converges quite rapidly. 
The final solution will give two sets of values for each column of A, differing 
only in sign. On the general assumption of & positive manifold we shall 


take the value of A which gives а positive sum of values in the corresponding 
column of V. The direction in which each vector is to be taken can be easily 
determined by premultiplying each column vector of A by а row vector 
consisting of the columnar sums of entries in F; the column of A must be 
reversed in sign if this product is negative. 


Numerical Exam ples 


esent criterion with a study involvin 
As an example of the use of the presen " М 
two factors, we s the data analyzed by Johnson and Reynolds (2). Table 1 
shows the original centroid matrix F, the preparatory calculations leading 
to the matrix A, the iterated solutions for M and hence for A, and the resulting 
atrix A, 


28 PSYCHOMETRIKA 


factor matrices for both the orthogonal and the oblique case. (Since the actual 
procedure for iteration can be more usefully illustrated for the three-factor 
case, this procedure is not shown for the Johnson-Reynolds data.) Table 1 
also shows the matrix Q; the non-diagonal entry of Q stands at .122 for the 
orthogonal ease and at .041 for the oblique case. Figure 1 shows a plot of 


TABLE 1 


Orthogonal and Oblique Analytical Solutions for the Data of 
Johnson and Reynolds 


Orthogonal Oblique 
Solution Solution 
F = FA=V FA=V 
I II ал? ал? ада А B A B 
1 .303 —.476 .092 .227 — 288 1 187 .583 1 —.282 .538 
2 .361 —.355 .130 .126 —.256 2 271 .428 2 —.099 .436 
3 .880 —.173 .144 .030 —.131 3 331 .255 3 .064 ‚265 
4 .697 —.080 .486 .006 —.112 4 661 .237 4 .315 .256 
5 .810 .232 .656 .054 .376 5 842 —.041 5 .638 —.017 
6 .801 -202 .642 .041 .324 6 .826 —.014 6 -608 .010 
7 .743 .265 .552 .070 .394 УЛ 784 —.089 T .629 —.066 
8 .619 .281 .383 .079 .948 8 667 —.133 8 .574 —.118 
9 .775 .183 .601 .033 .284 9 797 —.002 9 .578 .021 
10 .758 —.101 .575 .026 — 244 10 702 .330 10 .281 .350 
А А 
А B А B 
I .974 .228 I ‚548 .256 
II .228 —.974 П .836 —.967 
ЕЕ = А M M 
I II LI A B A B 
ВН 
1 2.268 .210 .703 I .948 .052 І .801 .066 
п .210 -086 —.011 II .052 .948 II .699 .934 
LI  .700 —.0n 841 LII .222 —.222 ТИ .459 — 247 
МАМ =Q M'AM =Q 
A B A B 


A 2.396 .122 A -700 .041 
B .122 .134 B -041 .144 


JOHN B. CARROLL 9 


I 


-IL 
FIGURE 1 
ns for the Data of Johnson and Reynolds (2) 
(The broken lines represent John: Reynolds’ own solution; the solid lines сыз 
the analytical solution.) 


onal Solutio 


Comparison of Two Orthog 
son and 


30 PSYCHOMETRIKA 


the factor loadings of the tests on the original arbitrary reference axes; it 
also shows the rotated planes produced by (a) Johnson and Reynolds’ own 
(orthogonal) solution, and (b) an orthogonal solution produced by the 
present criterion. Figure 2 similarly shows (a) the oblique solution which 
the writer would be inclined to make by the method of graphical inspection, 
and (b) the oblique solution produced by the present criterion. 

As a numerical example of а three-f: 
Thurstone’s well-known “box problem’ 


matrix F and an oblique solution achieved by the method of the present 


Discussion 


The criterion proposed here must b. 
mation to simple structure precisely for res 


one can fit hyperplanes in a highly acceptable manner, and without certain 
of the disadvantages of solutions reached by the present analytical method. 

Let us consider, however, the advantages of the present method. 

(1) It is a unique solution attainable by solely analytical techniques and 
hence requires no subjective judgment at any point. Furthermore, it requires 
no factor plots. 

(2) It gives a delineation of the approximate location of the best- 
hyperplanes in the factor space. It is believed that ani 
have to make only small additional rotations to bring t 
form which would satisfy graphical criteria. 

(3) It takes account of all the data in a given stud. 
torially complex tests in a, study, the criterion utilizes th 
in defining hyperplanes. For example, in Figure 2, the 
tests 4 and 10 influence the positioning of plane A (and also of plane B). 
Likewise, the many factorially complex "tests" in Thurstone’s box problem 


influence the positioning of the hyperplanes, as shown in Figure 3. At the 
same time, the more tests there are to define a 


fitting 
nvestigator would 
he solution to the 


У. If there are fac- 


| 


-I 


с i 

| omparison of Two Oblique 
rok i i 
oken lines represent an jnspec 


Solutions 


JOHN B. CARROLL 
31 


FIGURE 2 

for the Data of Johnson and Reynolds (2). (T 
tional graphical solution; the solid lines es (The 
analytical solution.) sent the 


32 PSYCHOMETRIKA 
TABLE 2 
An Oblique Analytical Solution for Thurstone's Box Problem 
F Е јр“ 
1 II III т XI HI £0 ва ЛЕШ A B С 
1 .659 —.736  .138| 1 .434 .542 .019 —.485  .091 —.102| 1 —.132 900 —.094 
2 .725  .180 —.656| 2 .526 .032 .430 .130 —.476 —.118| 2  .865 —.169 —.153 
3 .665  .537 .500| 3.442 .288 .250 .357  .332  .268| 3 —.113 —.105  .902 
4 .869 —.209 —.443| 4 .755 .044 .196 —.182 —.385  .0903| 4 .606 203 —.164 
5 .834  .182  .508| 5.696 .033 .258  .152  .424 .092| 5 —.167 264 .768 
6 .836  .519 .152| 6 .699 .269 .023  .434  .127 .079 6  .251 —.158 .686 
7 .856 —.452 —.269| 7 .733 .204 .072 —.387 —.230  .122| 7  .377 566 —.175 
8 .848 —.426  .320| 8 .719 .181 .102 —.361  .271 —.136| 8 —.158 744  .282 
9 .861 .416 —.299| 9 .741 .173 .089  .358 —.257 —.124| 9 .643 —.215  .296 
10 .880 —.341 —.354]10 .774 .116 .125 —.300 —.311 .121/10  .492 445 —.108 
11 .889 —.147  .436]11 .790 .021 .190 —.131  .388 —.06411 —.175 548 541 
12 .875 .485 —.093)12 .766 .235 .009  .424 —.081 —.045|12  .478 —.201 495 
13 .667 —.725  .109)13 .445 .526 .012 —.484  .073 —.07913 —.100 882 —.107 
14 .717  .246 —.619|14 .514 .061 .383  .176 —.444 —.15214  .847 —.218 —.090 
15 .634  .501 .522]15 .402 .251 .272  .318  .331 .262|15 —.152 —.075 .888 
16 .936  .257 .165)16 .876 .066 .027  .240  .154 .042116 .197 100  .581 
17 .966 —.239 —.083|17 .933 .057 .007 —.231 —.080  .02017 .297 474  .128 
18 .625 —.720 .166/18 .391 .518 .028 —.450  .104 —.12018 —.164 885 —.075 
19 .702 .112 —.650|19 .493 .013 .422  .079 —.456 —.073]119 .834 —.114 —.194 
20 .664  .536  .48820 .441 .287 .288  .356  .324 .262/20 —.102 —.108 .892 
A A 
ЈЕ II III DEE ШГ ШШ А B G 
I 8.469 2.181 1.800  .032 —.350 .380 I .207 .900 .382 
II 2.181 1.828 .877 —.707 .625  .117 II .274  —.889 .566 
TII 1.800  .377  .886 .683 —.497  .172 Ш —.915 ‚846 755 
LII .032 —.707  .633 8.529  .757 1.254 
LIII —.350 .625 —.497 .757 7.205 1.265 
или .380 .117 .172 1.254 1.265 1.510 
М’ M'AM =Q 
J II III LIU LII ПАП, A p с 
А .088 .075  .837  .081 – .272 —.251 А 2.009 .907  .267 
B .090 .790 .120 —.267 .104 —.308 В .307 2.485 265 
с Лб 2820 570 188 281 427 G .207  .265 2.756 
Í = Хо, = 8391 


p<a 


*Our A corresponds to Thurstone's B; our B corresponds to Thurstone's A; our C corresponds to 


"Thurstone's C. 


НЕ we use the transformation matrix A provided by Thurstone, f = 1.409. 


JOHN B. CARROLL 33 


4 
ш 


-I 


—--- (I) Thurstone's Solution 
(2) Analytical Solution 


-14.0 


-ш 
FIGURE 3 
tone's Box Problem (3, 136, 228). (Extended 


Comparison of Two Oblique Solutions for Thurs 
eference Axes II and III.) 


, Vector Representation оп В 


e acceptably the hyperplane will] be defined. 
-Reynolds data shown in Figures 1 and 2, the 
of tests 5, 6, 7, 8 and 9 in the configuration 
В — сїоѕе to where it would be put by 
for orthogonal and oblique solutions. 


they are together, the mor 
For example, in the Johnson 
relatively large concentration 
makes for a good definition of plane 


graphical inspection. This is true both 
These considerations lead to the conclusion that the present criterion will 


probably work best for well-designed factor studies where there are a large 
number of factorially pure tests and a relatively small number of factorially 


complex tests. 
The disadvantages of the present method seem to be as follows: 
р (1) The presence of factorially complex tests makes the primary axes 
more highly correlated than they would be if placed by graphical methods 
and may give rise to larger negative projections than would otherwise Бе 


34 


PSYCHOMETRIKA 


TABLE 3 


The Iterations (Skeletonized) for Analytical Solution of the Thurstone Box Problem 


Line 


M,’AMp Àj 


I IL TII yit dam amm = f. Bar Ul 


OONA WN H 


(Mp)! 0 1 1 
(AMp)’ 3.931 1.705 1.263 — 
МА 0 0 1 


0 0 * * 
74 .128 .289 * М 
0 0 1.263 0 


ж ж 

ж * 

0 1 
+.171 0 1.365 174 0 .985 

0 

1 

i 


030 0 0970 
030 0 970 
0  .030 .970 
0 H i H 

0 :085 .914 0 —.279] 1.219** 0 —.292 .956 
010 .085 .904 —.029 .005 —.279| 1.960 100 —.292  .954 
010 .085 .904 .029 —.095 —.279 1231 —.100 —.292  .954 
010 .075 .914 —.028 .096 —.263] 1259 100 —.275 .956 
010 .075 .914 — .028 —.096 —.263| 1231 —.100 —.275  .956 


= 171 0 | 1.321 = 174. .985 
171| 1.226 0 —.174 .985 
3 


oooooobo 


1&5 


6 


9 & 10 


11 & 12 


Iteration is started by arbitrarily choosing rotated vector 
initially taken collinear with reference axes II and III. Li 
the entries in Mp’ and Мс’, which have only unity in columns II and III, re- 
spectively. The asterisks in the columns for f, and A,’ show that these columns do 
not apply to the initial computation of (4/5)' and (А М)". Asterisks will similarly 
appear in the first two rows of computations for each new trial vector, e.g., in 


lines 13-14, 24-25, ete. For convenience the computation sheet shows all column 
vectors in transposed form. 


(АЛГ)! is computed using the matrix A in Table 2. 
be computed, but are not shown here. 

Although it might appear that our first trial vector for A has been collinear with 
reference vector I, we find that the minimum value of f4 can be found by taking 
it collinear with reference vector III, since by so doing we find f, = MA'AMg = 
1.263, as shown in the column labeled f- . If we had taken vector A collinear with 
reference vector I, / would have been equal to 3.931. The columns labeled A,’ 
have been filled out; M,’ is always a function of eat 

We shall now start iteration by taking a small move using vectors I and III, using 
both a positive and a negative value of Ма . .030 is arbitrarily placed in МА’ in 
column I; the remaining entries of M4’ are determined from this. The sum of the 
first three entries of M4’ must equal unity; the entries in Ал’ are square roots of 
these, and the last three columns of Ma’ are cross-products of entries in Ад’. 
Finding each value of fy ‚ we see (in the column f,) that neither move has the 
effect of reducing f, from the value in line 3. 

We now try a similar move, but using reference vectors II 
appears that f, decreases if we use some negative value of МА. 

The wavy lines @) аге used throughout the table to symbolize the arbitrary 
variation of entries until the value of f, becomes stationary. In the present сазе, the 
following values of Mra were tried: .117, -250, .106, .128, :095, -067, and .085. А 
graph of f, as a function of Mira aided in finding the minimal value, whieh was 
determined to be approximately as shown in line 8. A single asterisk was placed in 
line 8 to show that a temporary minimum had been attained. 

An attempt was made to reduce f, by fixing уд and varying A4 and Mura , but 
to no avail. Both positive and negative square roots were employed. 

A similar attempt was made using various values of Mira and Ад, but without 
any reduction in f, . The vector in line 8 was therefore marked as the final one for 
this trial, and another asterisk in line 8 was placed to signify this fact, 


A. The other vectors are 
пе 1 is hence the sum of 


Summational checks should 


and IIT. It immediately 


JOHN B. CARROLL 35 
о 


TABLE 3 (Continued) 


" МАМ AJ 
Line ош nt id ЭШ ши => т t ш 
13 | (Mg)' 0 .085 1.914 0 0 —.279 © Ф * 

14 | (АЛГь)' [8.520 .802 1.680 „501 —1.251 —.082 * * * * 
15 | М’ 0 1 0 0 0 0 .802 0 1 0 
16 0 .970 .030 0 0 -171| .814 0 985 174 
17 0 .970 030 0 0  —A71 .842 0 .985 —174 
18 .030 .970 0 ай 0 0 | 1.020 A74 .985 0 
19 1030 .970 0 фл 0 о | .746 |—.174 .985 0 
20 MES 0 i 0 0 i i > 0 
21 024 .976 0 +—.154 0 о | 14* |—156 988 0 
22 ‘Olt 976 .010 —.117 —.012 0.099] 762 |—118 .988 .100 
23 ‘Old 976 .010 —.117 +012 —.099| .748 |—.118 0988 —.100 
24 | (Mfg) |.02:1001 914 —154 0 279) > * * * 
25 | (AMg)' |8.998 1.881 1.108 — 1.834 —,269 —.324 * * * * 
26 | Mc' 0 0 1 0 0 0 1.108 0 0 1 
27 o0 f g 0 о i i o i 
28 0 036.964 0 0 .188| 1.075 0 191 .982 
29 + рам i i i i i i 982 
30 (од .027 964 06  .003  .16l 1,048" | .095 164 .982 
31 poo % | i i H Poae [P 
Line 


tor B; therefore Mp’ contains sums of entries in the 


13 We proceed to iterate trial vec 
which, as pointed out for line 1, corresponded 


Мл’ just computed (line 8) and Mo’, 
to a vector collinear with reference vector III. 


T4 (AMg)' is computed using Mn 3 and the matrix А of Table 2. 
15 We choose an initial vector co ence vector II because this yields 


the smallest value of f. . 
16-17 An unsuccessful move using vec 
18-21 A move using vectors Тапа II; 
proceeded in this direction (line 20) un 
Again, a graph of the function aided in 
22-23 Ап unsuccessful move using vectors Ian 
taken, unsuccessfully, and is not shown here. 
chosen as final for this trial. 
24 Since we now proceed with vector С, Мк’ is the sum of corresponding entries in 
M! and My’, lines 8 and 21. 
26 The initial vector for C is chosen collinear with III because it gives the smallest f, . 
27-28  Iteration varying II and MI, reducing fz to 1.075. 
29-30 Iteration varying I and П, reducing f. to 1.048. М 
31 Unsuccessful iteration varying I and III, without reduction of f, . The vector in 
line 30 is chosen as final. In general, iteration proceeds until all possible combina- 
tions of vectors have been tried at least once. 


' jn line 1 
linear with refer 


tors П and Ш. 
а negative value of Mn reduced f. , so iteration 


til а stationary value of f. was achieved. 
locating the minimizing values. 

d III. Another trial with II and III was 
The vector represented in line 21 was 


36 


PSYCHOMETRIKA 


TABLE 3 (Continued) 


МАМ A,! 

Line I и YII ТШ BITE LIII =f. I II III 
32 | (М;)' .033 1.003 .964 —.138 -093 -161 * * $ x 
33 | (AMg)' 4.176 1.940 1.186 — 1.003 -905 483 * ы x * 
34 | Ma’ 0 :085 .914 0 0 —.279| 1.114 0 —.292 .956 
35 i 085 3 i i i i i -292 3 
36 .030 .085 .885 {051 —.163 —.274 1.009 —.173 —.292  .940 
37 E ЖҮ: i i i i i i i 
38 .030 .120 .850 060 —.160 —.319 1.007** |—.173 —.346 .922 
39 | (MRY 039 .147 1.814 076 —.067 —.158 53 = " 
40 | (АМ»)’ 3.875 .848 1.787 1.445 —1.448 -116 * м * 
41 | Mg’ .024 .976 0 —.154 0 0 .698 —.156 .988 0 
42 | E 8 i i i i i H В 
43 065 .885 .050 —.240 :057 —.211| .638** —.255  .941 —.224 
44 | (Mg)' :095 1.005 .900 —.180 —.103 —.530 * Е Ы ‘i 
45 | (АМр)' 4.395 1.877 1.193 —2.416 —1.401 —.848 * М * = 
46 | Мс’ 009 .027 .964 016 .093 161] .935 :095 .164  .982 
47 | ео i i i i H : 
48 -110 .310 .580 -185 .253 .424| .596** .332  .557  .702 
49 | (Mg) 175 1.195 .630 —.055 .310 .213 » * £ 
50 (АМ»)’ |5.133 2.455 1.171 —.408 2.834 .959 * * ж Н 
51 | М.’ .030 .120 .850 .060 —.160 —.319 .660 —.173 —.346  .922 
"AMI Lw Cd x. E OW AE 
53 | .088 .075 .837 081 —.272 —.251 .571** |—.297 —.274  .915 
54 | (Mp)! .198 .385 1.417 -266 —.019 178 * Ы Е " 
55 | (AMp)’ [5.129 1.288 1.965 3.102 —.250 .935 T * = i. 
56 | Mp’ :065 .885 .050 —.240 :057 —.211| .616 —.255 .941 —.224 
57 | Па 5 i i i H i 
58 090 .790 .120 —.267 104 —.308| .573** |—.300 .889 —.346 

— = 
59 | (Mg) 178 .865 .957 —.186 — 168 —.559 > Ж Ы ы 
60 |(АМ»)’ |4.914 1.850 1.864 —2.145 —2.056 —.956 * Е * * 
61 | Мс’ 110 .310 .580 .185 .253 .424| .533 332  .557  .762 
62 | dw ЗА i i i i i i i 
63 110 .320 .570 — .188 251  .497| Бон 8332 566 755 

Line 
32 (Mg)' is the sum of corresponding entries in lines 21 and 30. 
34 


Since we now have better information as 
information in line 8 (the result of previo 


ctor great drop in f, from line 30 
shift in Ag’ during this trial, . 


ctors A, B, and C. The changes in f. 


to line 48, and likewise, the great 
The third cycle of iterations for ve 


are becoming 
smaller. 


JOHN B. CARROLL 37 


TABLE 3 (Concluded) 


| М,„'АМь As! 

Line I п Шш ш МП пи =f. I п ш 
64 | (Mg) | .200 1.110 .690 —.079 355 2419 * * * * 
65 | (АМ»)’ [5.220 2.452 1.184 —.597 2.929 .854 m * * * 
66 M4" .088 .075 .837 081 —.272 —.251| .575 —.297 —.274  .915 
SL. ТЫ. т 
68 .088 .075 .837 .081 —.272 —.251| .575*** —.297 —.274  .915 
69 | (Mr) | 198 .395 1.407 269 —.021 .176 zi ® * * 
70 | (АМ)! |5.134 1.294 163 3.117 —.247 .940 * * * * 
71 | Му .090 .790 .120 —.267 .104 —.308| .572 —.300 .889 —.346 
72 | Box Ui H i i i i i 
73 090 .790 .120 —.267 .104 —.308| .572*** |—.300  .889 —.346 

Line 


vector A; though all possible directions of moves 
reduction of .001 in fz . 

h vector B; no shifts resulted in any reduction as 
d B had not moved from their positions in the 
third cycle, it was unnecessary to make further iterations with C (unless a higher 
degree of accuracy had been wanted). The computations were therefore complete, 
and the final victors in lines 63, 68, and 73 were regarded as constituting the 


analytical solution. 


64-68 А fourth round of iterations with 
were tried, none resulted in any 
09-73 А fourth round of iterations wit 
great as .001 in f, . Since vectors А an 


the case. One would like а criterion which would place the planes pretty 
laced by graphical methods, particularly where 


much as they "would be р 
the structure is as clear as it is in the Thurstone box problem. A possible way 
out of this difficulty would be (a) to carry out the present solution, (b) use 
some established rule to determine which are the factorially complex tests, 

rom the matrix Ё, and (d) make 


(c) eliminate the factorially complex tests from | 
a new solution. Аз ап arbitrary rule for selecting complex tests, it may be 
suggested that a complex test be defined as one having entries greater than 


.25 in two or more columns of the matrix V. | 
(2) The method requires 2 large amount of computational labor. This 


should not present a real obstacle, however, if the method can be adapted 
for solution. by high-speed computing devices. It is also possible that a non- 


iterative solution may ultimately be found. 


38 


PSYCHOMETRIKA 


REFERENCES 


1. Horst, Paul. А non-graphical method for transforming an arbitrary factor matrix into 
а simple structure matrix. Psychometrika, 1941, 6, 79-99. 

2. Johnson, D. M., and Reynolds, F. A factor analysis of verbal ability. Psychol, Rec., 
1941, 4, 183-195. 


3. Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 


4. Tucker, L. В. A semi-analytieal method of factorial rotation to simple structure. 
Psychometrika, 1944, 9, 43-68. 


Manuscript received 8/1/52 


PSYCHOMETRIKA—VOL. 18, No. 1 
MARCH, 1953 


A NEW STATUS INDEX DERIVED FROM SOCIOMETRIC 
ANALYSIS* 


Leo Karz 
MICHIGAN STATE COLLEGE 


For the purpose of evaluating status in a manner free from the defi- 
ciencies of popularity contest procedures, this paper presents a new method 
of computation which takes into account who chooses as well as how many 
choose. It is necessary to introduce, jn this connection, the concept of attenua- 


tion in influence transmitted through intermediaries. 


Introduction 
most serious investigators of inter-personal 
been dissatisfied with the ordinary indices 
of “status,” of the popularity contest type. In the sociometrie field, for 
example, Jennings (1) says; и |. jt cannot be premised from the present 
research that greater desirability per se attaches to a high [conventional 
computation] choice-status as contrasted with a low choice-status in any 
sociogroup without reference {о its milieu and functioning." However, in 
the absence of better methods for determining status, only two alternatives 
have been open to the investigator. He has been forced either to accept the 
popularity index as valid, at least to first approximation, or to make near- 
anthropological study of a social group in order to pick out the real leaders, 
i.e., the individuals of genuinely high status. 

Тће purpose of this paper is to suggest à new method of computing 
status, taking into account not only the number of direct “votes” received 
by each individual but, also, the status of each individual who chooses 
the first, the status of each who chooses these in turn, etc. Thus, the proposed 
new index allows for who chooses as well as how many choose. | 

For the present discussion, an operational definition of status is assumed, 
Status being defined by the question asked of the members of the group. The 
same device, then, may be used to study influence, transmission of informa- 


tion, etc. 


For a considerable time, 
and inter-group relations have 


The New Status Index 
“palloting,” we shall use the matrix repre- 


sentation for sociometric data as given by Forsyth and Katz (2). An example 
for a group of six persons appears below. In this example, A chooses only 


"his work was done under the sponsorship of the Office of Naval Research. 
39 


То exhibit the results of the 


40 PSYCHOMETRIKA 


F, B chooses C and F, C chooses B, D, and F, and so on. The principal diagonal 
elements, by convention, are zeroes. The question asked could be, “Which 
people in this group really know what is going on?” 


Chosen 
Chooser 

A B с р E Е 

А 0 0 0 0 0 1 

B 0 0 1 0 0 1 

C 0 1 0 1 0 1 

D 1 0 0 0 1 0 

E 0 0 0 1 0 1 

F f 0 0 1 0 0 
Totals 2 1 1 Б] T 4 


In the Forsyth and Katz formulation, the 6 X 6 аттау above is referred 


to as the choice matrix, C, with element c;; — response of individual 7 to 
individual j. Further, 


as pointed out by Festinger (3) for matrices whose 
elements are () or 1, powers of C have as elements the numbers of chains of 
corresponding lengths going from i through intermediaries to j. Thus, 
С° = (с), where c? = У), са сы ; each component, Cis са; , of с“? is equal 
to one if and only if 2 chooses k and ik chooses 7, 1.е., there is a chain of length 
two from to j. Higher powers of C have similar interpretations. 

The column sums of C give the numbers of direct choices* made by 
members of the group to the individual corresponding to each column. Also, 
the column sums of C? give the numbers of two-step choices from the group 
to individuals; column sums of C^, numbers of three-step choices, etc. An 
index of the type we seek, then, may be constructed by adding to the direct 
choices all of the two-step, three-step, ete., choices, using appropriate weights 
to allow for the lower effectiveness of longer chains. In order to construct 
appropriate weights, we introduce the concept of “attenuation” in a link of 
а chain. 

It is necessary to make some assumptions regarding the effective func- 
tioning of an existing link. The first assumption we make is common to 
all sociometrie work, namely, that our information is accurate and that, 
hence, certain links between individuals exist; and where our information 
indicates no link, there is no communication, influence, or whatever else 
we measure. Secondly, we assume that each link independently has the same 
probability of being effective. This assumption, obviously, 
than is the previous one; however, it seems to be at least a 
approximation to the true situation. Thus, we conceive a 
pending on the group and the context of the particular inve 


*In the sequel, it is assumed that C is a matrix of 0’: 


is no more true 
reasonable first 
constant a, de- 
stigation, which 
s and 1’s. 


ES 


ПЦ C ——=— а. JP————————— 


LEO KATZ 41 


has the force of a probability of effectiveness of a single link. А k-step chain, 
then, has probability а“ of being effective. In this sense, a actually measures 
the non-attenuation in a link, a = 0 corresponding to complete attenuation 
and a = 1 to absence of any attenuation. With this model, appropriate 
weights for the column sums of C, С°, ete. are a, a’, etc., respectively. 

We have noted previously that the quantity @ depends upon both the 
group and the context; we now examine this notion in greater detail. Suppose 
that our interest is in the communication problem of transmission of inform- 
ation or rumor through a group. It is quite evident that different groups 
will respond in different ways to the same information and, also, that a 
single group will exhibit different responses to various pieces of information. 
For example, the information that the new high-school principal is unmarried 
and handsome might occasion a violent reaction in a ladies’ garden club 
and hardly a ripple of interest in a luncheon group of the local chamber 
of commerce. On the other hand, the luncheon group might be anything 
but apathetic in its response to information concerning а fractional change 
in credit buying restrictions announced by the federal government. 

Some psychological investigations have been directed at exactly this 
point. It is possible that these, or subsequent studies, may reveal that a 
is or is not relatively constant among all existing links in a group with respect 
to а particular context. If it should appear that а is not relatively constant, 
it will be necessary to consider more complicated models. For present pur- 
poses, we shall assume @ is relatively constant and that, either by investigation 


or omniscience, its value is known. 

Let s; be the sum of the jth column of t 
vector with elements s; . In the example above, у 
(2, 1, 1, 3, 1, 4). We wish to find the column sums of the matrix 

recul s. >з шер mU E: 
| ; lu. Let ђе а column vector 
ith unit elements. Then # = 


f the matrix С and s a column 
e.g., the row vector $ = 


T has elements ¢;; and column sums „= 

with elements t; and и be à column vector W 
w[ — aC)" — I]. 

Multiplying on the righ 

ЕСТ — aC) => —% 


and by transposition, 


{Бу (= aC) we have 
HU = aC) — au'C, 


(I — a€?)t = aC"u. 
^ 7 ts are the row sums of С”, i.e. 
But, C'u is a column vector whose elements. Tow ; lies, 
Ui olur ses oi 0 therefore C'u = 5 Finally, dividing through by a, 


we have 


(11 = c) ==. 


а 


42 PSYCHOMETRIKA 


Thus, given a, C, and s, we have only to solve the system of linear equations 
above to obtain 1. Actually, we compute no powers of C although our original 
summation was over all powers. The process breaks down in case 1/a is not 
greater than the largest characteristic root of C. (See 5, 168). Some experi- 
ence with computations indicates that reasonable, general-purpose values of 
1/a are those between the largest root and about twice that root. It is evident 
that the effect of longer сћатв on the index will be greater for smaller values 
of 1/a. Finally, it is a real advantage in computations to choose 1/a equal to 
an integer. In the numerical example of the following section, the largest root 
is less than 1.7 and 1/a is taken equal to 2.0. "There is an extensive literature 
on bounds for such roots; in this connection, see the series of papers by 
A. Brauer (6). For matrices of non-negative elements, a. simple upper bound 
for the largest root is the greatest row (column) sum; this bound is attained 
when all row (column) sums are equal. For the solution, several abbreviated 
methods of computation are available. See, e.g., Dwyer (4). 

The usual index of status is obtained by dividing the column sum 5; by 
n — 1, the number of possible choices. Using the same notion, we obtain as 
divisor of the ¢;, with ir — 1)" = (YR = 2) +++ (n — k), 


m = a(n — ax a(n — De J a(n = j” digas 


= @ — Dla ve. approximately.* 


Finally, then, the new Status index vector is given by (1/m)t, where t is the 
vector solution to the System of equations above. 


A Numerical Example 


We shall consider the example of the group of six persons whose choice 
matrix is given at the beginning of the paper. For this group, conventional 
technique of dividing column sums by n — 1 = 5 produces the 


5 
Conventional Status Vector = (4, 2, 2, .6, .2, 8). 


, 


Going beyond the surface question of “How many choose X?" to the 
deeper question of “Who choose X?" reveals certain important features of 
this artificially constructed group. F and D аге, apparently, of highest Status. 
А, however, is chosen by both of these though he is not chosen by any of 


the "small fry" in the group. Is not A's status higher than is indicated by 
the conventional computation? 


"Theapproximation improves with increasingn. Therelative erro 


r«] n71(a — 9) lell 
For example, when и = 25, а = 2, the relative error «4 X 10-1 Ле (n -2)e i: 


o — лааны. — —a— M MEE: 


LEO KATZ 43 


Other features might be pointed out, such as that F's choice of D is 
not reciprocated, ete. But this is enough to illustrate the well-known de- 
ficiencies in the conventional computations. We pass now to actual computa- 
tion of the vector 1. 

We first write out the required equations, using a = 1/2 for simplicity. 
The coefficients of & , t, ++- , & are the negative of the transpose of C plus 
1/a = 2 added to each principal diagonal term. The equations are 


2t, — d = t=2 
26 — t =1 

— + 2 = 1 
Ба is + 24, — $5 — t, —8 

E ty + 2t; = 1 

=h — = h — 5 4- 2t = 4, 


and the resulting values of 4 , +++ , 6 are 13, 1, 1, 11.4, 6.2, and 12.6, res- 
= 27.71 agrees fairly well, 


pectively. The approximate computation of m = 
even here with n = 6 only, with the exact value of 26.25. Dividing the ¢; by 
27.71 gives the 

New Status Vector = (47, .04, .04, 41, .22, .45). 


Comparison of the new with the conventional computation above indicates 
that every change is in the appropriate direction to overcome the short- 
comings in the index pointed out previously and the new status indices are 
in much more nearly correct relative position. 


REFERENCES 
l. Jennings, Н. H. Leadership and isolation. New York: Longmans, Green, 1943 and 1950. 
2. Forsyth, E., and Katz, L. А matrix approach to the analysis of sociometrie data: 


1946, 9, 340-347. | 
ams using matrix algebra. Human Relations, 1949, 


preliminary repor?, Sociometry, 
3. Festinger, L. The analysis of sociogr: 
2, 153-158. . _ 
4. Dwyer Р. S. Linear computations. New York: Wiley and w^ 
5. Ferrar, W. L. Finite matrices. London: Oxford Univ. Pry, 951. -— m 
6. Brauer, A. Limits for the characteristic roots of а matrix. D sd n порез іп 
Duke Mathematical Journal, I: 1946, 13, 387-395; II: 1947, 14, 21-26; : 1948, 15, 


871-877; IV: 1952, 19, 75-91. 


Manuscript received 7/8/52 


Revised manuscript received 8/8/52 


PSYCHOMETRIKA—VOL. 18, No. 1 
MARCH, 1953 


A FACTOR ANALYSIS OF INTRA-TASK PERFORMANCE 
ON TWO PSYCHOMOTOR TESTS 


Epwin А. FLEISHMAN” 


USAF AIR TRAINING COMMAND 
HUMAN RESOURCES RESEARCH CENTER 


There are indications that even during the short time of administration 
of a single psychomotor test, the ability or abilities sampled may shift materi- 
ally in importance. It then becomes important to know the stages in which 

the stage at which the test is most complex, and 


these fluctuations occur, Т t 
the stage at which the test most nearly measures one ability at a time. This 
paper describes an application of factorial methods to this problem. Factor 


analysis of inter-trial correlations on two models of the Rudder Control 
Test revealed three factors, “Steadiness-Control,” “Precision of Movement,” 
and “Strength.” The same factor pattern was confirmed in a separate 
factor analysis on another sample in which the order of administration of 
. the tests was reversed. Implications are pointed out for future psychomotor 


test development. 
Introduction 


nd that psychomotor tests, such as those used 
program, are factorially complex; that is, they 
For this reason the apparatus tests cur- 
correlate quite highly with each other 
(2, 3). Although these tests possess 


It has generally been fou 
in the Air Force classification 
tend to measure several abilities. 
rently in use in the Aircrew Battery 
and also with other tests in the battery 
considerable validity for pilot selection, higher over-all validity would be 
contributed to the total battery if the individual psychomotor tests had 
extremely low correlations with each other, even if the tests had somewhat 
lower individual "alidities. While test development in other aptitude areas 
is increasingly being aimed at the sampling of one ability category at a time 
through insights provided largely by factor analysis, little ciuile this 
direction has been made in developing psychomotor tests. The implications 
for developing such psychomotor tests would be especially important for 
classification, since one could avoid having to weight into various composite 
aptitude indices invalid variance along with the valid variance found to 


exist in complex tests.T 
* Motor Skills Research, Laboratory, Lackland Air Force Base, 
San Vr е The opinions or conclusions contained in this report are those of 
the author. They are not to trued as reflecting the views or indorsement of the 
Depa ir Force. fact DA IS 3 E S 
dic а гет here that it is possible to break down the variance in per- 
formance of complex sychomotor tasks into кшен иаша руш 
functions. Whether the introduction of complexity of function (e.g., tasks requiring 
coordination or integration O operations) jntroduces хапае not reproducable by any 
number of more analytical tasks remains à problem for future research. 


45 


46 PSYCHOMETRIKA 


There is also some indication that even during the short time period of 
the administration of a single psychomotor test, the ability or abilities 
sampled may shift materially in importance. If this is true, then it becomes 
important to know (1) at what stages in the task these Systematic changes 
in function occur, (2) at which stage the task is most complex in the number 
of abilities measured, and (3) at which stage the task is most nearly measuring 
one ability at a time. 

Factor analysis methods would appear to provide one of the most 
fruitful approaches to this problem. Information derived from such analyses 
may provide a basis for deriving part scores from such tests, in order to 
determine which of the factors isolated in a test are most valid. If certain 
trials are found more valid than others, it may be possible to emphasize the 
strongest factors in those trials in the design of new experimental models of 
the test. Or, as an alternative, trials yielding the highest and purest measure 
of certain factors could be scored separately and validated independently. 


Purpose 


The purpose of this paper is to describe an application of factorial 
methods to the trial scores of two forms of the Rudder Control Test, the 
Standard Rudder Control (CM120C) (3), and the Experimental Six-Target 
Rudder Control Tests (1, 4). In the Rudder Control Test, the examinee 
sits in a mock cockpit of an airplane. His own weight throws the seat off 
balance unless he applies correction by means of rudder foot-pedals. Pushing 
the right rudder pedal causes the apparatus to swing to the right, and pushing 
the left pedal causes it to swing to the left. Тће examinee's task is to keep 
the cockpit pointed directly at one of three target lights situated on a panel 
before him. The task seems to require a keen appreciation of loss of balance 
and a quick but not over-controlled correction made by lcg action. 

Тће Experimental Six-Target Rudder Control Test involves the same 
apparatus as the Standard Model, except the examinee is provided with a 
panel of six target lights to which he must successively shift the apparatus 
às each is presented. 

These tests were selected for several reasons. First, the Standard 
Rudder Control Test generally has been found the most valid single test in 
the Airerew Battery. On the other hand, little is known about its factorial 
content. Previous factor analyses of the Airerew Battery have re 
to have relatively low communality with respect to the oth 
(2, 3). Moreover, administration conditions of the test 
than one factor is suspected. In addition, it has been 
study (1), that the correlation between the Standard 
and the Six-Target Rudder Control "Test depends 
given first. The question arises аз to the extent to 


vealed it 
er Aircrew tests 
are Such that тоге 
Shown in а previous 
Rudder Control Test 
somewhat on which test is 
Which the change in corre- 


EDWIN A. FLEISHMAN 47 


lation due to administration order can be explained by systematic changes 
in the abilities involved at various stages of performance on each task. 


Method 


The Standard Rudder Control Test and the Experimental Six-Target 
Rudder Control Test were administered to 698 pilot-cadets. In 356 cases, 
the Standard Model was administered before the Experimental Model. In 
342 cases, the Experimental Model was administered first. 

The eight-minute testing period for the Standard Model was divided 
into six one-minute trials and separated by 30-second rest periods. This is 
the standard operating procedure followed in all operational administrations 
of this test. The Experimental Model Test period was divided into four 
two-minute trials with 30-second rest periods. Separate scores were recorded 
at the end of each trial for each of the tests. 

The score derived from the Standard Model was the total time the 
apparatus is held on target by the subject. The score derived from the 
Experimental Six-Target Model was the number-of-targets achieved by the 
subject. This test is of the self-pacing type, in which the subject must shift 
the apparatus to as many successively presented targets as possible within 


the testing period. He must hold the apparatus on target steadily for three 


seconds before a new target is presented.* 
Correlations among all the trial scores of both tests were obtained. 


Separate matrices were obtained for each administration order of the two 
tests. Each inter-trial correlation matrix was then factor analyzed by 
Thurstone’s Centroid Method, and the axes rotated to simple structure. 


Results 
e intercorrelations of trial scores for both tests when 


the Standard Model was given first. Table 2 presents a similar matrix for 


the sample in which the Six-Target Model was given first. | 
Separate faetor analyses were made for each of the two correlation 


matrices. 


Factor Analysis of Trial I ntercorrelations when the 
Administered First 


Four factors were extracted from the matrix of inter-trial correlations 


obtained when the Standard Rudder Control Test was administered before 
ime-on-target score was ge derived from the Six- 
Таг бос relati etween number-of-targets and time-on-target scores was 

urget Test. The correlation be Number-of-targets score proved higher, this score has 


he test. 
{ both tests was .58 when the Standard 


TEne«correlatian Бем уш the Six-Target Test was given first. Although these 


Table 1 presents th 


Standard Test is 


cally significant. 


48 PSYCHOMETRIKA 


TABLE 1 
Intercorrelations Between Trial Scores of the Standard and Six-Target Rudder Control л 
Tests when the Standard Model is Administered First* 
(М = 356) 
Test Trial 1 2 8 4 5 6 tT 8 9 10 | 
1. Standard Rudder Control 1 74 64 70 57 48 43 35 33 29 
2. Standard Rudder Control 2 7. 73 71 58 51 41 35 35 29 
3. Standard Rudder Control 3 64 73 66 62 56 45 40 40 33 
4. Standard Rudder Control 4 70 71 06 77 67 55 50 49 46 
5. Standard Rudder Control 5 57 58 62 77 71 54 50 50 49 
6. Standard Rudder Control 6 48 51 56 67 71 52 50 52 48 
7. Six-Target Rudder Control 1 43 41 45 55 54 52 82 75 70 
8. Six-Target Rudder Control 2 35 35 40 50 50 50 82 79 78 
9. Six-Target Rudder Control 3 33 35 40 49 50 52 75 79 76 
10. Six-Target Rudder Control 4 29 29 33 46 49 48 70 73 76 Г, 


*Decimal points are omitted, 


TABLE 2 


Intercorrelations Between Trial Scores of the Standard and Six-Target Rudder Control P 
Tests when the Six-Target Model is Administered First* 


(N = 342) 

"Test да J 2 $ 4 5. 6 T B $ 10 
1. Six-Target Rudder Control 1 85 76 71 31 20 12 31 26 30 
2. Six-Target Rudder Control 2 85 89 85 35 25 21 38 30 30 
3. Six-Target Rudder Control 3 76 89 90 39 31 29 46 36 35 
4. Six-Target Rudder Control 4 71 85 90 41 34 30 44 34 37 
5. Standard Rudder Control 1 31 35 39 41 49 53 53 49 43 
6. Standard Rudder Control 2 20 25 31 34 49 52 50 44 39 
7. Standard Rudder Control 8 12 21 29 30 53 52 52 46 35 
8. Standard Rudder Control 4 31 38 46 44 53 50 52 62 50 
9. Standard Rudder Control 5 26 30 36 34 49 44 46 62 49 

10. Standard Rudder Control 6 30 30 35 37 43 39 35 50 49 


*Decimal points are omitted, 


the Six-Target Rudder Control Model. Table 3 presents the centroid factor 
matrix obtained. Table 4 presents the orthogonal solution of rotated factor 
loadings obtained using the criteria of simple structure and positive manifold, 


Interpretation of the Factors 


Factor I derives its highest loadin 
Target Rudder Control Test. 
uniquely. In these trials, it w 


gs from the four trials on the Six- 
These trials seem to measure this factor 
ill be recalled, the subject must shift the 


Centroid Factor Loadings of Intra-Task Performance on the Standard an 
Rudder Control Tests, when the Standard Model is 


EDWIN А. FLEISHMAN 


TABLE 3 


49 


d Six-Target 
Administered First * 


= 


Ооюомосњомюн 


Factors 

"Test Trial h? 

I II III IV 
. Standard Rudder Control 1 70 41 —25 18 75 
. Standard Rudder Control 2 72 44 —26 11 79 
. Standard Rudder Control 3 73 35 —17 14 71 
‚ Standard Rudder Control 4 83 30 11 14 81 
. Standard Rudder Control 5 80 21 27 08 76 
. Standard Rudder Control 6 75 16 29 19 71 
. Six-Target Rudder Control 1 79 —39 12 —16 82 
. Six-Target Rudder Control 2 7i —47 07 —09 81 
. Six-Target Rudder Control 3 7. —46 08 —18 81 
Six-Target Rudder Control 4 70 —45 13 —12 73 

at/k 57 14 04 02 

* Decimal points are omitted, 
TABLE 4 


Rotated Factor Loadings of Intra-Task Performance on the Standard and Six-Target 


Rudder Control Tests, when the Standard Model is Administered First* 


Factorst 

he 

Test Trial T II ш IV 

PM 8С 8 E 
1. Standard Rudder Control 1 36 78 " » 5 
2. Standard Rudder Control Ae о и | 
3. Standard Rudder Control 3 43 : 33 00 M 
4. Standard Rudder Control 4 58 Gn 40 o 
5. Standard Rudder Control 5 63 2) 4 gs Ма 
6. Standard Rudder Control б и Ыры, Жр Ac As 
7. Six-Target Rudder Control 1 DIM sl EET = 
8. Six-Target Rudder Control 2 n 05 ү ^ a 
9. Six-Target Rudder Control 3 90 05 p x ES 
10. Six-Target Rudder Control 4 ES E» D i 78 

ni `47 25 05 00 

ыд Ra de a ane I. Precision of Movement; II. Steadiness-Control; III. Strength; 
IV. Residual. 


50 PSYCHOMETRIKA 


apparatus to as many successively presented targets as possible during the 
testing period. The primary task, then, is to move the apparatus as precisely 
as possible with controlled movements of the foot pedals. For this reason, 
this factor has been labeled “Precision of Movement Under Speed Conditions." 
"This interpretation seems to be confirmed by the relatively high but some- 
what lower loadings on the factor evidenced by the last three trials of the 
Standard Rudder Control Test. 'This seems plausible since the standard 
administration condition of this test changes at the beginning of trial 4. 
In trials 4 through 6, the task changes from a single-target task to a three- 
target task, in which the subject must not only hold the apparatus on target, 
but must also shift to one or the other of three targets when they are presented 
during each trial. The relatively low but significant loadings of the first 
three trials of the Standard Test on this factor may be explained by the fact 
that at the start of each of these trials, the subject must still move the apparatus 
from a side position to the single center target when it is presented. The 
extent to which he can do this precisely thus contributes in some degree to 
the total time the apparatus is held on target during each of these trials. 

Factor II is common only to the six trials of the Standard Rudder Control 
Test. Moreover, the highest loadings on this factor are derived from the 
first three trials of this test. In these trials, the subject’s only task, once 
he gets on target, is to hold the apparatus steadily on target. Гог this reason, 
this factor has been called “Steadiness-Control.” This interpretation is also 
supported by the substantial but somewhat lower loadings on this factor of 
the last three trials of the Standard Test. Although in these trials the subject 
is presented successively with three targets to which he must shift the appara- 
tus as each appears, he is still required to hold the apparatus on each target 
from 7.5 to 27.5 seconds before a new target can be presented. 

Moreover, this factor does not appear in the four trials of the Experi- 
mental Six-Target Test. Since in this test it is necessary for the subject. to 
hold the apparatus on target only three seconds before a new target is pre- 
sented, further support is obtained for the interpretation that this factor 
represents steadiness of some kind. 

Factor III is common only to the last three trials of the Standard Rudder 
Control Test. Interpretation of this factor is thus somewhat more difficult. 
There is reason to believe, however, that this factor represents a strength 
function. In these three trials the subject is presented with three targets. 
When either one of the side targets is presented, the subject must hold the 
apparatus steadily at a rather difficult angle. This involves considerable 
muscular tension, especially in the legs, in order to keep the apparatus lined 
up in these positions. It will be recalled that during these three trials these 
positions must often be maintained as long as 27.5 seconds on certain settings. 
Moreover, the burden of keeping the apparatus on target is concentrated 
mostly on one leg at a time in these positions. The combination of such 


EDWIN A. FLEISHMAN 51 


long-time delays together with side target angles is a task situation which 
appears only in these three trials. It was thus decided to call this factor 


"Strength." 
Factor IV is a residual factor, containing only insignificant loadings. 


Factor Analysis of Trial Intercorrelations when the Six-Target Test is 
Administered First 
- A similar analysis was made of the matrix of inter-trial correlations 
(Table 2) obtained when the Six-Target Rudder Control Test was admin- 
istered before the Standard Model. This was done independently of the 


TABLE 5 


Centroid Factor Loadings of Intra-Task Performance on the Standard and Six-Target 
Rudder Control Tests, when the Six-Target Model is Administered First* 


Factors 

"Test. Trial h: 

I II III IV 
1. Six-Target Rudder Control d 68 —52 —20 16 81 
2. Six-Target Rudder Control 2 77 —55 = —18 92 
3. Six-Target Rudder Control 3 82 -A 12 —16 93 
` 4. Six-Target Rudder Control 4 81 —42 16 —18 88 
5. Standard Rudder Control 1 65 30 18 14 : 55 
6. Standard Rudder Contyol 2 58 35 17 11 50 
7. Standard Rudder Control 3 56 42 26 11 57 
8. Standard Rudder Control 4 7 33 13 16 66 
9. Standard Rudder Control 5 64 36 20 12 60 
10. Standard Rudder Control 6 58 24 19 13 45 

AT 17 03 02 


a2/k 


о 


*Decimal points are omitted. 


rst analysis. Т. 5 presents the centroid factor matrix obtained. Again, 
first analysis. Table 5 p the axes rotated to simple structure. Table 6 


Ри d 
four factors were extr acted an im J 
presents the orthogonal solution of rotated factor loadings. 


this analysis has confirmed the same factor pattern 
as ыч күймө eins analysis. Factor 138 ee “Precision 9 f Move- 
ment” factor with the strongest loadings on the four trials of the Six Target 
Test. Factor II is “gteadiness-Control” and again is common only to the 
Standard Test, with highest loadings on the first three шан. Factor ш 
; У не walled “Strength” and again is found only in the last 
is the factor оу =“ ndard Rudder Control Test. Factor IV is a residual 
е v on | : pe der doublet appears in the longies of the first 
"54 in ale Six-Target Model. This may be an initial adjustment" 

го tri 2 


52 PSYCHOMETRIKA 


TABLE 6 


Rotated Factor Loadings of Intra-Task Performance on the Standard and Six-Target 
Rudder Control Tests, when the Six-Target Model is Administered First* 


Factors} 
Test Trial I II III IV ht 
PM SC S R 
1. Six-Target Rudder Control 1 82 01 03 38 81 
2. Six-Target Rudder Control 2 92 02 —02 27 92 
3. Six-Target Rudder Control 3 96 03 02 —01 93 
4. Six-Target Rudder Control 4 94 01 08 02 88 
5. Standard Rudder Control 1 35 65 04 02 55 
6. Standard Rudder Control 2 26 66 02 —03 50 
7. Standard Rudder Control 3 20 73 —04 —07 57 
8. Standard Rudder Control 4 41 63 31 06 66 
9. Standard Rudder Control 5 35 58 37 03 60 
10. Standard Rudder Control 6 36 47 31 07 45 
a*/k 40 23 03 02 
*Decimal points are omitted. 
{Factors are identified as follows: I. Precision of Movement; 11. Steadiness-Control; III. Strength; 
IV. Residual. 


factor'of some kind present in this m 
On the basis of this limited evidenc 
can be attributed to this factor. 


ore difficult test when it is given first. 
e, however, no additional significance 


Discussion 


The results have indicated that three primary factors account for intra- 
task performance on the Standard and Six-Target Ruddér Control Tests. 
Moreover, the same factor pattern is found for the two tests regardless of 
their order of administration. The Six-Target Test appears least complex 
factorially, with practically all the variance accounted for by the “Precision 
of Movement" factor. On the other hand, the Standard Test is factorially 
complex, and contains all three factors. The last three trials of this test 
are the most complex, with substantial loadings on all three factors. The 
first three trials on this test provide the best measure of the “Steadinegs- 
Control” factor, but also have some variance contributed by th 
of Movement" factor. 

The results also show that the communalities of 
remain high for each administration order, but the communalities of the 
trials of the Standard Test are somewhat lower when this test is administered 
second. This unexplained variance suggests that additional factors may 
still be present in the Standard Test under these conditions. 


e “Precision 


the Six-Target Test 


EDWIN А. FLEISHMAN 53 


The correlation between the total scores on each of the two tests appears 
primarily explainable in terms of the only factor common to the two tests, 
Precision of Movement. The fact that a lower correlation is obtained when 
the Standard Model is presented second is partially attributable to the lower 
loadings of the Standard Model trials on this factor when it is given second and 
to the additional specifie variance which appears in this administration 
order. Since the factor pattern is the same for each trial in each administra- 
tion order, the correlation change is not attributable to the appearance of 
different combinations of these factors at different stages in each task situation. 

'These results have certain implications for future test development. 
Since the Rudder Control Test has generally been found to be the most 
valid test in the Aircrew Battery, it may now be possible to find out just 
what it is in the test that contributes most of the valid variance. It may be 
found that each of these components is differentially valid in predicting 
pilot success, or that practically all the valid variance is contributed by only 
one of the factors. If the latter is true, then it would be important to exclude 
from the test those factors which are not valid, and to emphasize the measure- 
ment of the valid factor at the expense of the less valid ones in future models 
of the test. This could presumably lower the correlation of the test with 
other tests in the battery and could exclude much of the invalid variance 
that is now weighted into the composite stanine along with the valid variance 

^in the test. " 

If, оп the other hand, more than one factor were found to contribute 
valid variance, then separate sub-test scores could be derived for each factor 

and each separate score could be weighted appropriately. For example, 
st three trials of the Standard Test would be the best available 
“Steadiness-Control” factor. Scores on the four trials of the 
arget Rudder Control Test give the strongest and most 
“Precision of Movement" factor found in the Standard 
f the other two factors are partialled out, the last 
vide the best measure of the 


scores on the fir 
measure of the i 
Experimental Six-T 
pure measure ofthe 
Test. After the effects of the ‹ 
three trials of the Standard Test would pro 


4 h” factor. " c H P 
Str ee Ки would be to include criterion scores in the correlation 
nose t the factor analyses. The loadings of the cri- 


matri i са, 

rix of trials, and to rep ctor 18S 0 

Bum de es а each factor should indieate the unique contribution to be 
: scores а 


$ о the validity from each factor. 
expected to the v des suggestions for the design of new models of 


л a PONI 

the а Test for further experimental or operational use. d 
would be possible to design the test in order to provide т three alternative 
administration conditions, each condition emphasizing the measurement of 
rs. Separate scores could be derived from each “sub-test.” 

ње ен results indicate that trials 1-3 of the Standard Test 
™ AG Ann juin of Factor П, Steadiness-Control. However, 


54 PSYCHOMETRIKA 


these trials also have some loadings on Factor I, Precision of Movement. 
It would be possible to design the test to minimize the initial movement 
required during these trials. As the test is now constituted, the apparatus 
rests in an extreme side position when the test is not in operation and during 
rest periods. Ву providing a slightly off-center rest position during the first 
three trials in which the subject’s task is to “stay on" on the center target, 
the loadings on Factor II would be expected to decrease, leaving a more 
pure measure of *Steadiness-Control" in these trials. 

Since the four trials of the Six-Target Test appear to provide the most 
pure measure of the "Precision of Movement" factor, it seems reasonable 
that six targets might be introduced in a new model of the test instead of 
the factorially complex three-target procedure now used. Administration 
of the six-target trials could be under the same self-pacing conditions as 
that followed in the experimental model, with the same short-time delays 
between new target presentations. 

With respect to providing for maximization of the “Strength” factor 
during certain trials of new models of the test, the following possibilities are 
indicated. This factor appears in the three-target Standard Test where 
the subject must often hold the apparatus for long-time delays on one or 
the other of the side targets before a new target is presented. If the hypothe- 
sis about this factor is tenable, it is conceivable that the factor might be 
maximized by requiring the subject to hold the apparatus on more widely 
spaced side targets for somewhat longer time delays. This could easily be 
provided for by incorporating this procedure in conjunction with the six- 
target panel suggested previously. During these trials, however, only the 
extreme targets would be used and the time delay would be longer (e.g., 
20-28 seconds). In addition, increasing the spring tension on the foot pedals 
would presumably give further emphasis to this factor. 

It would be possible for a re-designed test to combize these features 
by providing for three trials in which a center target only is used, three 
trials in which the six targets are used under self-pacing conditions, and 
three trials in which only the extreme side targets are used with long-time 
delays. An apparatus of this type would allow further confirmation of the 
factors isolated in this study, if a factor analysis of performance on the new 
apparatus increased or reduced trial factor loadings in the expected directions. 
Moreover, each sub-task would presumably be in a more factorially “ 
form for further validation study against: pilot success. 

From a methodological point of view, the study indicates that the appli- 
cation of factorial methods to intra-task performance is another fruitful 
approach to the study of the nature of aptitude tests in the psychomotor area, 
Although factor analysis methods have been more recently utilized in de- 
fining homogeneous sub-tests among items within printed tests, little 
tion has been made of these methods to performance within appar: 


pure" 


applica- 
atus tests. 


EDWIN А. FLEISHMAN 55 


The present study seems to indicate that results of such analyses can contribut 
to the isolation of important variables and can suggest leads for x Ale 
test improvement in this aptitude area. — 
Тће interpretations in this study are, of course, restricted by the limited 
number of variables in the analysis. Future studies might well include 
additional variables whose factorial content is well-established. Although 
the present study was aimed at investigating factors involved during pe- 
formance of a test in its operational setting, future studies might also include 
factor analyses of extended practice on psychomotor tasks. Such studies 
should lead to a better understanding of the influence of systematic changes 
in function involved at different stages in performance of psychomotor inn. 


REFERENCES 
1. Fleishman, E. А., and Reynolds, В. Comparative data on the Standard Rudder 


Control Test and the Experimental Six-Target Rudder Control Test. Research 


Note P&MS 52-4, Human Resources Research Center, Lackland Air Force Base, 
San Antonio, Texas, 1952. Ы 
2. Quilford, J. P., (Ed.) Print 
Research Report, No. 5. 
3. Melton, A. W. (Ed.) Apparatus tests. 
Report, No. 4. Washington: U. S. Govt. Printing Office, 1947. 
4, Perceptual and Motor Skills Research Laboratory. A six-target rudder control task. 
Research Note P&MS 50-3, Human Resources Research Center, Lackland Air Force 


Base, San Antonio, Texas, 1950. 


ed classification tests. AAF Aviation Psychology Program 
Washington: U. S. Govt. Printing Office, 1947. 
AAF Aviation Psychology Program Research 


Ey 


Manuscript received 7/1/62 


Revised manuscript received 8/8/52 


PSYCHOMETRIKA—VOL. 18, No. 1 
MARCH, 1953 


AN APPLICATION OF CONFIDENCE INTERVALS 
AND OF MAXIMUM LIKELIHOOD TO THE 
ESTIMATION OF AN EXAMINEE'S ABILITY* 


FREDERIC М. Lorp 
EDUCATIONAL TESTING SERVICE 


A mathematical definition of the theoretical relation between the exam- 
inee’s actual responses to the test items and his “‘true ability” is selected. A 
maximum-likelihood solution is obtained for estimating the examinee’s 
“true ability” from his responses to the items. The standard error of the 
maximum-likelihood estimate is obtained, its relation to the discriminating 
power of the test is pointed out, and some generalizations are drawn as to 
the optimum level of item difficulty. The Neyman-Pearson power function 
is applied to determine which of two psychological tests is the most powerful 
for ita selection of “successful” examinees. 


When we use the usual type of mental test score to measure the ability 
of the examinees in a group, the metric supplied by the test scores cannot be 


considered satisfactory. The inadequate nature of this metric is apparent 


when we consider that two tests measuring the same ability, administered 
to the same group of examinees, may yield two score distributions of entirely 
different shapes. The metric of “true” scores obtained from very long tests 
is as subject to this objection as is the metric of fallible scores obtained from 
short tests. We here propose to use a more adequate metric for measuring 
the ability underlying the test score—a metric that will remain invariant 
from test to test—and to investigate what may be learned from a maximum- 
likelihood approach to the problem of estimating the examinee's ability, as 
defined by this metric, and from certain other related approaches of modern 
statistical theory. The reader is warned that, in view of the heavy (but not 
insuperable) computational difficulties in the way of any practical applica- 
tion, the present discussion is directed chiefly towards determining what 
conclusions of general theoretical significance can be drawn from a con- 
sideration of the proposed metric. } 

If the response of an examinee to each ofn test items may be scored either 
0 or 1, we may denote the score of examinee a on item 2 by Tia (i = 1, Das 
п; а = 1,2, +++, m), where zi equals either 0 or 1. For any effective test, 
tion implies some relation, in general, between 


the logic of the practical situa Е 

the probability that s: = 1, which we will denote by Prob (та, = 1), and 

Prob (£;a = 1), where j denotes some item other than item 2; and also some 
Ia 


“The author is indebteded to Dr. John W. Tukey for helpful comments on a draft 


of the present manuscript. 
57 


58 PSYCHOMETRIKA 


relation between Prob (Xia = 1) and Prob (x = 1), where b denotes some 
examinee other than examinee a. If these relations can be suitably specified 
mathematically, we сап apply Fisher's method of maximum likelihood 
(25, 133-142; 16, 152-161) so as to obtain from the data on any actual set 
of examinees’ answer sheets maximum-likelihood estimates of the para- 
meters describing the test items and of the parameter describing the ability 
of each individual examinee. 

Once the parameters to be estimated have been satisfactorily specified 
and maximum-likelihood estimates for these parameters derived, a large 
body of standard mathematical theory сап be brought to bear on many 
unresolved problems in testing. The maximum-likelihood estimate of the 
examinee's ability itself constitutes an answer to the question of how the 
items should be weighted in obtaining the examince’s total score. The 
discriminating power of the test at different ability levels can then, in large 
sample theory, be measured by the usual standard error of the maximum- 
likelihood estimate. The examinee’s responses to the test items may be 
used to set up a confidence interval (16, Ch. 11) within which the examinee's 
true ability may be assumed to lic. The length of this confidence interval 
provides a measure of the test’s discriminating power at a given level of actual 
test score. If it is desired to build a test that will have maximum discriminat- 
ing power at a given cutting score, the problem of the optimum distribution 
of item difficulties in such a test can be reduced, in large-sample theory, to a 
question of determining for what values of the item difficulties the standard 
error of the test score is a minimum. Finally, if it is desired to test some 
hypothesis—for example that a given examinee's true ability is above rather 
than below a given value—the Neyman-Pearson theory of testing hypotheses 
(25, 152 ff; 16, Ch. 12) can be brought to bear; for example, the power func- 
tion of any test score used to check this hypothesis can be determined and 
compared with the power functions of scores on other psychological tests in 


order to determine what sort of psychological test would be best for this 
purpose. 


I. The Case of Free-Response Items 


We will here assume, as have Guilford (6), Richardson (19), Mosier 
(17, 18), Ferguson (3), Lawley (9, 10), Lorr (15), Tucker (23, 24), Lord 
(13, 14), Cronbach and Warrington (2), and others, that the probability 
(Р) that an examinee will answer an item correctly is a normal ogive func- 
tion of his “true ability" (c) in the area measured by the test: 


вые | ИИ Мо) de. (1) 


Ky 


Here c, is a population parameter measuring the true ability of examince а; 


FREDERIC M. LORD 59 


h; and R; are population parameters describing item i, as theoretically de- 
terminable by administering the test to an infinitely large population of 
examinees; K; is a function of R; , as defined by the relation 


К, = у = В (9) 
(it being assumed that 2; = 1); and the function N (c) represents the normal 
frequency function 


AES = ==... зр 

№) = Ws ыр" 

It is assumed, given the true values of h; , В; , and c, , that formula (1) for 
Pi, is not altered by the presence of additional information about the per- 
formance of examinee а on items other than item 2 or about the performance 
on item i of examinees other than examinee a. 

"These assumptions are reasonable ones for the case where the test items 
cannot be answered correctly by guessing. An empirical study (13) indicates 
that these two assumptions may be used with considerable confidence for 
certain types of test material. Known methods are available in theory 
(although they may be excessively cumbersome in practice) for determining 
whether or not any given set of item-response data is compatible with the 
assumptions made. This may be done in large sample theory by (a) esti- 
mating the unknown parameters h; , R; , and c, from the data by maximum- 
likelihood methods and (b) using chi-square to determine whether the observed 
data could reasonably be assumed to have arisen by random sampling from 
a population of the type implied by equation (1) and characterized by the 


estimated values of the parameters. | | 
Equation (1) may be readily obtained, if desired, from the assumptions 


that 


(a) c is normally distributed, | | 
(b) there is a continuous variable (2!) underlying the item, | 
(c) cand zi have a joint normal distribution with the correlation №, and 
(d) Pia is the probability that x} > h; when с = ca. 
Under these assumptions, equation (1) for Pia follows directly from the 
definition of Р,, given in (d). 2 | | | 
If the ability c is normally distributed in the population of examinees, 
it may be shown (23) that R; is the population biserial correlation between 
‚ and с: that A; is a measure of item difficulty related to the proportion (р,) 
of — еее in ‘the population who answer item i correctly by the equation 


„= | : NO de; (3) 


and (24) that е is the common factor of the tetrachoric item intercorrelations. 


60 PSYCHOMETRIKA 


Since c is the common factor of the items, it will remain invariant from 
one test to another measuring the same ability. The variable c thus provides 
for measuring the examinee’s ability the desired metric that remains invariant 
no matter what test of that ability is administered. This invariance, and the 
developments on the following pages, will hold whether or not c is normally 
distributed in the group tested. 

P;, has been called the “item characteristic function." The relation 
between P;, and c, , as represented by equation (1), is illustrated in Figure 
1 by a selected example. 


1.00 


75 


50 


25 


Probability of a Correct Answer (2) 


[9] E 
-3 -2 -I о | 2 3 
Ability (С) 
Fieure 1 
Item Characteristic Curve When h; = —.562 and R; = .531 


II. The М aximum-Likelihood Estimates 


Suppose we administer a sample of n test items to a sample of m exami- 
nees, obtaining mn item responses, i.e., тт values of z;, . Under the as- 
sumptions made, the frequency function for each z;, may be written 


П) = PI Qus (4) 


where Q: = 1 — Pia. The likelihood of the sample of mn observed values 


= — s 
— a a 


FREDERIC M. LORD 61 


is therefore 
От уйга» » ө В m II II 2:998. (5) 


We wish to use the method of maximum likelihood in order to determine 
what values of the unknown population parameters (h; , R; , and с,) will 
maximize the likelihood of the тт observed values. This problem is distinct 
from, although related to, the usual applications of the maximum-likelihood 
method to probit analysis, as treated, for example, by Finney (4). 

We will need the following derivatives: 


аР. 5 Nia 

we qe (6) 
ӘР ia Nia = 
аб, = © — №) кз, (7) 
ЭР. _ В Мы 

дс. К; (8) 


where 
S L) 
Ni = м Be). 


Now the logarithm of the likelihood is 


log L = У) У [tia log Pia + (1 — з) log Qia). (9) 
Taking the appropriate derivatives of (9), we have 
Е [s - а дә” 1—1,2,...,2); (10 
а, XS Pe Hori А ene pm); (ПО) 
дї Б 170 ES — бо Маи (1 Sai). = вм. | 
Өй ПИ: S Ра Qi. 


(2 = 1,2,...,т); (11) 


à log L па и _ (1— za | LE vk 
вы x[ Tin | eo пина E 
The next step is to set equations (10), (11), and (12) each equal to zero, 
placing a cireumflex (~) over those symbols that involve the population 
parameters, h; , R; , and c, , to indicate that these parameters have been 


replaced by the sample estimates ћ , Ê: , and 6,. For example, letting 
~ = R6 
= (13) 


62 PSYCHOMETRIKA 


we will write 


Ка = Ма), (14) 


г, us | : мед de. (15) 


From equation (11) we may therefore write 


кб ia (= ашы 
[у ó. 


= ка, [20 =й] 9, g 
a P Qia 
It may be seen from equation (10) that the term under the second summation 
sign in (16) vanishes. 
Using У). and У), to denote summation over all items for which Wis = 
1 or 0, respectively, for a given a, and using У), and У to denote summa- 
tion over all examinees for whom z;, = 1 or 0, respectively, for a given т, we 


may rewrite equations (10), (16), and (12), respectively, as follows: 
Ка _ y Ма 


Él - ur (Йй к=, сео је (17) 
êN ia Жр ' 
RN in RN in 

КР. = 2 кб (@+== 1,2, a + f)- (19) 


Equations (17), (18), and (19) theoretically can be solved so as to deter- 
mine the desired values of 6, , h; , and R;. The scale for 6, may be determined 
by imposing the conditions that 22,6, = 0 and У), & =-m. The value of 
6, is the maximum-likelihood estimate of c, , the measure of the examinee’s 
ability. In partieular, if c, is normally distributed for any given group of 
examinees, then / is the maximum-likelihood estimate of the measure of 
item difficulty defined by equation 3, and Ê; is the maximum-likelihood 
estimate of the biserial correlation of the item with ability. Unfor tunately 
in practice both т and т are usually moderately large numbers, so that any 
iterative solution of equations (17), (18), and (19) for values of ё, ; h, 
В; will usually be lengthy by ordinary methods. 

The item difficulties routinely obtained in item analyses are, of course, 
approximations to the values of p; , and approximations to the values of 
h; may thus be readily obtained by means of equation (3). Similarly the 
item-test biserial correlations obtained in item analysis may be adjusted to 
obtain approximations to the values of R; Ways of refining these ap- 
proximations may suggest themselves (13). a any case, if we assume that 


‚ апа 


ee A а ЕЕЕ ee eee 


— 


FREDERIC M. LORD 63 


the values of ^; and R; for a given set of items have already been determined 
to an adequate approximation by а prior investigation, it would be feasible 
in actual practice to estimate the measure of a given examinee's ability from 
his responses to the test items, using some method of successive approxima- 


tion to solve equations (18) for ё, . 


III. The Special Case of Eguivalent Frec-Response Items 


In this and the following section, we will limit consideration to the 
special case where all items are “equivalent,” i.e., are of equal difficulty and 
are equally correlated with the ability measured. We will assume that h; 
and R; are known, at least to an adequate approximation. Since we will 


consider only one examinee at à time, we will discontinue use of the subscript 


a. 
only z;, need be kept under the summation sign in 


In this special case, 
g the subscript $ from every symbol 


equation (12); consequently, droppin 
except x; , we have 


alog L _ RN (s _ +=) 
Эс ТАРО А (20) 


answered correctly, i.e., where s 


is the number of items 
P, we find from (20) that 


where s = У): 2; 
e. Replacing Q by w= 


is the usual test scor 


alog L _ RN(s — nP) | 
óc KPR (20 


If we set (21) equal to zero, we find 


5 
` i.e. es N@ @ = 5 (22) 
K 
uivalent” items, We have thus found thawé, the maci 
aminee's ability, îs a simple function of the 
lue of é may be readily determined in any 
f any standard table of areas 


In the special case of “ей 
mum-likelihood. estimate of the ex 
usual type of test score, 5. The và 
given case from equation (22) with the help о 
under the normal curve. , 

Figure 2 presents for illustrative purposes the relation between 6 and 
s/n for two tests composed of “equivalent” items of 50-per-cent difficulty, 
the value of R being .30 in one case, .60 in the other. For example, if an 
examinee answers 60 per cent of the items correctly on the former test, our 
estimate of his c-score is 81. This means that we estimate that his ability 
would place him .81 standard deviations EM теза па the Desc rop 
for which the values of № and А were calculated. 


64 PSYCHOMETRIKA 


The curves in Figure 2 relate the c-score metric with the metrie provided 
by the usual test scores. In the case where В = .30, the relation of these 
two scales is practically linear between 6 = —2.5 and ё = + 2.5 (scores 
beyond = 2.5 will be obtained by only about 1 per cent of the examinees if 
c is normally distributed in the basic group). In general, however, as illus- 


€- SCORE 
o 


-l 


-3 
о 2 4 6 в 10 
PROPORTION OF CORRECT ANSWERS (s/n) 
FIGURE 2 


Ability Score (2) as a Function of the Proportion of Correct Answers (s/n) on Tests 
Composed of n “Equivalent” Free-Response Items of 50-Per-Cent Difficulty Having 
Specified Values of R 


trated by the curve for the case where R = .60, the difference between scores 
(s/n) of .50 and .60 represents less difference in underlying ability than does 
the difference between scores of .80 and .90. This fact is the result of the 
squeezing of the score scale at the extremes, arising from the impossibility 
of obtaining scores below 0 or above 1.00. 


' 


‘Will be so rare that our rest 


FREDERIC M. LORD 65 


А further interesting theoretical result about the test score is apparent 
upon examination of the frequency function of Tia , зо, °** , Х,а for the је 
2 › Una ; P 


of “equivalent” items: 


dis i Toa m irit) = о" 


Erie пориче 
=P 0 
= Р"д“". (23) 
in the case of а test composed of “equivalent” items 
B B H B , 
e (s) is а sufficient stalistic (16, 151) for estimating 


and hence that the test score contains all the informa- 
ће examinee's ability, contained by the exami- 


It is seen from (23) that, 
the usual type of test scor 
the examinec's ability (с); 
lion, relevant to the estimation of t 


nee’s responses to the test items. 
In order to obtain a useable standard error of ĉ for the case of “equivalent” 


items, we must agree in advance that we will never assign an infinitely large 

value of é to any examinee. When any examinee either answers all items 

correctly or answers all items incorrectly, there is no finite value of é, that 

will satisfy equation 22;* in such cases we will assign some arbitrary value, 

С, = 1,000 0r ĉ = — 1,000, for example. When n is large, such an occurrence 
ilts will not be affected by this manipulation. 


With this understanding, the sampling error of б, is approximately 


(16, 208 її.) ? 
n Дека) |“ 
SBa = 5/5 ЕІ 2 , (24) 


ates that the expected value is to be taken. Sub- 


E indic 9 
uation (4) for f(x) into equation (24) and 


where the operator 5 
ion given in eq 


stituting the expression 
differentiating, we obtain 


du T 
1 ПУ? rus " 

S.E.; = Vn Ln E(x: — P) . (25) 

alue of 25, and consequently E(x; — Py = PQ, 

1 distribution when т = 1. 

| result that the sampling error of our estimate 


e case of equivalent items, 
K УРО 
= i (26) 


‘ening ё by the equation P = (s + 4)/(m + 1) in order + 
efining ¢ by пе a Freeman, М. F., Transformations related. 
Ann. math. Statist., 1950, 21, 607-610. 


Now P is the expected У 
the variance of the binomia 

We thus have the fina 
of the examinee's ability is, for th 


‚ *Dr. Tukey suggests definir 
avoid this difficulty. See Tukey, ^+ 
to the angular and the square root. 


66 PSYCHOMETRIKA 


The thought suggests itself that this standard error may be used as a 
measure of the discriminating power of the lest for examinees al a given level 
of ability. As a matter of fact, the mathematical expression for S.E.: turns 
out to be identical with the reciprocal of the expression for a discrimination 
index previously developed (13) from an entirely different line of reasoning. 

S.E.: may be used to set up confidence intervals within which the true 
value of с may be assumed to lie. Since confidence intervals obtained from 
maximum-likelihood estimates are known to be asymptotically shortest, 
unbiassed confidence intervals, the length of such a confidence interval might 
well be taken as a measure of the discriminating power of the test al a given 


level of test score. Illustrations of such confidence intervals will be given in a 
later section. 


TABLE 1 


The Standard Error of 6 for a Test Composed of п “Equivalent” Free-Response Items, 
for Specified Values of P and R 


Р R S.E. 
5 30 3.98/ Vn 
.60 1.67/ Vn 
А or .6 30 4.03/ ут 
.60 1.69/ Vn 
Зогл 30 4.19/ уп 
.60 1.76/ Vn 
.2 ог .8 30 4.54/ Vn 
60 1.90/ Vn 
Jor 9 30 5.44/ Vn 
.60 2.28/ Ут 


А Table 1 presents a few standard errors calculated from equation (26) for 
illustrative purposes. The relation between P and §.E.; for the case when 
R = .30 is also shown by the curve labeled k = œ in Figure 3. : 

We note that the standard error of the б-веоге is proportional to УРО/М 
a quantity that increases as P departs from .50. For large n, the length of 
the confidence interval within which the true ability of the examinee may be 
assumed to fall will be proportional to S.E.: . We thus sce, at least y een 
n, that a given examinee’s ability (c) can be estimated more accurately by а. 
ministering items that are оў 50-per-cent difficulty for examinees like dion 
by administering items at any other single difficulty level. 


FREDERIC M. LORD 67 


Similar or related conclusions have been reached by a number of writers 
(20; 19; 7; 1,2, 21, 18, 14): Empirical evidence relating to this point has 
also been obtained (12, 22, 19). 
IV. The Case of Multiple-Choice Пет 
examinee who does not know the answer to a multiple- 


th 1 chance in k of guessing correctly. 
function for this case by Р; , we have 


Pi = P $$ - (27) 


Suppose that any 
choice item guesses at the answer wi 
If we denote the item characteristic 


the results for the case of “equivalent” items. 


We will give here only 
edures already outlined lead to the result that 


In this special case the proc 
c should be estimated from the equation 

5 1 3.8 

D =- = —— 

diet ( k— j. (28) 
2) only in that the usual type of correction 


This result differs from equation (22 
for guessing is applied to the test score. С may of course be caleulated from 


equation (28). 
It is readily verified that, n (ће сазе of tests composed of “equivalent” mul- 


corrected for guessing (s — Е = У 


examinee’s ability (c). 
found to be approximately (cf. 


с items, the usual type of test score 


is a sufficient statistic for the estimation of the 
' r large n is 


The standard error of ё fo 
equation 26): 
1 n 20) Е d VP 
юлла | Veal) VARN, 
for various values of Ё in the 
Table 2 gives certain selected values of S.E.: obtained 
or the same values of k and R. For every value of / in 
f P' are listed: ( 1) the value halfway between the chance 


success level and 1.00, and (2) the value of Pi for which 8.8.2 is a minimum, 
as determined by numerical methods. It is frequently considered that the 
former vaoo D^ provides optimum discrimination, but present results 
indicate that the most reliable estimate of an examinee s ability is obtained 
when P' is somewhat easier than this. Investigations carried through by 
Cronbach and Warrington (2) and by the present writer“ into other theo- 
retical measures of the discriminating power of actual test scores have led to 


the same conclusion: that optimum measurement of a. given examinee’s ability 
by means of multiple-choice i s an ilem difficulty level somewhat 


{етв require 
~ T ion theory 
*See also Hick, W. E, Information > 
Statist. Sect., 1951, 4, 157-164, where Shannon an 
to a closely related problem. 


tple-choic 


` (29) 


Figure 3 illustrates this relation of S.E.; to pr 


case when № = .30. 
from equation (29) f 
Table 2, two values 0 


and intelligence tests. Brit. J. Psychol. 
d Weaver's information theory is applied 


68 PSYCHOMETRIKA 


Vm UNITS 


STANDARD ERROR OF © 


POPULATION PROPORTION OF CORRECT ANSWERS (P') 
FIGURE 3 
The Standard Error of é as a Function of P' for a Test Composed of “Equivalent” k-Choice 
Items, When R is .30 


TABLE 2 


The Standard Error of ё for a Test Composed of п "Equivalent" Multiple-Choice Items, 
for Specified Values of P’ and k, when R is .30 


Ie P' S.E.; 

5 6 4.88/ У ћ 
5 .682 4.80/ Vn 
4 .625 5.14/ ут 
4 713 5.03/ Ма 
3 6607 5.64/ Vn 
3 .759 5.44/ уп 
2 75 6.90/ Vn 
: 835 6.52/ Vn 


€ 


FREDERIC M. LORD 69 


easier than halfway between the chance success level and 1.00. This conclusion 
may be rationalized, if desired, as being attributable to the fact that difficult 
multiple-choice items tend in general to be less valid than easy ones, since 


guessing is more often involved in answering a difficult item than in answering 


ап easy one. 
V. Confidence Intervals 

been pointed out that asymptotically shortest, unbiassed 

confidence intervals for estimating c for a given examinee may be set up by 


the usual methods (16, Ch. 11) from a knowledge of 6 and S.E.;. When 
be used, as will be illustrated very 


It has already 


is not large, more exact methods may 


briefly in the following. 


The question is frequently raised as to what meaning, and how much 


meaning, we can attribute to obtained scores that are near the "ceiling" 
or the “floor” of the multiple-choice test administered, i.e., to scores that are 
that are close to the expected score for an examinee who 
ems by random guessing. We can answer this question by 
determining a confidence interval within which the true ability of an examinee 
obtaining such a dubious score may be expected to lie. Such confidence inter- 
vals are shown graphically in Figure 4 for most of the possible scores on a 


nearly perfect or 
answers all the it 


hypothetical Test B. a Р 

Test В is composed of forty-nine 5-choice items, each having an В = 

.44721. The frequency distribution of the item difficulties (expressed in terms 

of h,) is roughly normal, Л: having a mean of zero and a standard deviation of 

79. The actual frequency distribution of л, is given in Table 3 (a multiple of 
TABLE 3 


Frequency Distribution of Item Difficulties (A; or pf) in Test B 


h; р! Frequency 
85 23 1 
6v5 27 Y 
Av 35 б 
25 50 M 
0 60 11 
— 92/5 70 9 
ECC 85 6 
3 


70 PSYCHOMETRIKA 


МБ was chosen for h; in order to facilitate computation). The proportion 
(p) of correct answers that will be given to item 2 by a group of examinees 
Whose abilities are normally distributed is related to h, by equation (3) and 
by the fact that р; = p; + q;/k (cf. equation 27). 

When c is fixed, the sampling distribution of the obtained score (s) on 
Test B is readily obtained by replacing P by P' in equation (5) and then 
adding together all those values of L for which У, ама = s. The relative 
frequencies for all values of s were thus calculated for fixed values of c selected 
at intervals of 0.5. The corresponding cumulative frequencies, rounded off 
to two decimal places, are shown in Table 4, where each column represents 
the cumulative frequency distribution of s for the corresponding specified 
value of c. 

Suppose now that we have administered Test B to John D. 
а score of 16. Reading across the row for a score of 15 in Т. 
that examinees whose c-score is —3.5 obtain scores of less than sixteen 74 
per cent of the time; while from the row for a Score of 16 we find that examinees 
whose c-score is —2.0 obtain scores of 16 or less.24 per cent of the time. Let 
us make the assertion that John's true ability score (c) lies between —3.5 
and —2.0. Now, sampling theory assures us (granted the validity of our 
basic assumptions) that if we make a large number of similarly derived state- 
ments, these statements will be correct at least 50 per cent (74 minus 24 per 
cent) of the time, on the average. The interval —3.5 < c < —2.0 is thus a 
50-per-cent confidence interval for estimating c. 

This confidence interval is shown in Figure 4 as a horizontal line opposite 
the score value of 16. The confidence interval for each possible raw score is 
also shown, within the limits of the graph. Actually, all intervals are de- 
termined so as to run from the 25-per-cent to the 75-per-cent level, rather 
than from the 24- to 74-per-cent level as described for the sake of simplicity 
in the illustration given. The necessary 25- and 75-per-cent points were 
obtained by curvilinear graphic interpolation. The dot near the middle of 
each confidence interval indicates the value of c for which the likelihood of 
occurrence of the given. value of s is greatest. This value is of course б, the 
maximum-likelihood estimate of c. 

These confidence intervals provide an answer to t 
meaning and how much meaning to attach to the test 
to those near the ceiling or near the floor of the test. 
out, the length of the interval provi 
power of the test at a given level of t Basically the length of the 
confidence interval 1$ somewhat analogous to the "standard error of a true 
score” discussed in many texts (6, 414; 8, 43). 

The longest confidence interval shown in full in Figure 4 is the one fov 
а score of 15. It extends roughly 1.75 units on the c-scale—from — 4.0 to 
—2.25. Тће shortest interval is probably the one for а score of 34, having а 


; Who obtained 
able 4, we find 


he question of what 
scores—in particular 
As previously pointed 


des a measure of the discriminating 


est score. 


FREDERIC M. LORD 


TABLE 4 


Theoretical Cumulative Frequency Distribution of Scores on Test B for Selected Fixed 


Values of c (all frequencies multiplied by 100) 


71 


Selected Fixed Values of c 


в)  —40 -35 —30 -2.5 —20 —15 —10 -0.5 00 +05 +10 +15 +20 +25 +35 +30 +40 
* 100 
Е 90 95 
47 98 92 т 
We 98 92 76 50 
s 99 95 80 53 26 
а er St м м CH 
as 69 2933 7S)» T 15 03 
42 от 85 56 24 06 01 
41 bo 63 © 38 11 02 
40 98 86 56 23 05 01 
39 99 95 7 40 12 02 
38 98 S9 61 6 06 01 
37 UB 80) "авина 1402 
36 99 93 09 з ст 01 
35 98 S6 55 20 03 
34 90: +1957 ЗЕТА 1 08 
33 9 91 66 25 06 
32 97 84 52 18 03 
31 00; Moe © 7589! ТОБО 
30 9s 90 Gf 27 05 
29 о 96 8з 21 17 02 
28 о 03 ‘74 38 10 01 
27 os 88 02 27 05 
26 99 5 < SL 50 т OS 
25 os 91 71 37, 0 01 
24 од 98 85 0 26 06 
23 08 оз 97 47. Мо 
22 © = 88 м 35 № OL 
21 о 98 3 81 5 29 05 
20 о 96 зз 71 43 16 03 
19 90 97 з sı 60 30 0 01 
18 08 04 87 72 4% 21 05 [01 
17 95 00 80 401 35 18 02 
16 91 83 70 48 м ст а 
15 85 7 58 B0. 15 , 04 
14 б 68 45 24 (09 . 02 
13 [CE NEC MO 
12 52 37 22 00 02 
11 85 25 18 05 01 
10- 6:20. 18; ШК 3 
9 16 09 оз 0 
8 о о 01 
04 2 01 
02 01 
01 


Ornehaan 


PSYCHOMETRIKA 
72 


Test Score (s) 


True Ability Score (c) 
FIGURE 4 " 
2 " Estimating T) Ability Score (c) from E 
ifty-Per- Jonfidence Interval for Estimating True А 
анан Obtained Score (s) on Test B 


ach 


length of only 0.8 units approximately. It is again 
discrimination is obtained at score levels somewhat а 
between a chance score and a perfect score. 


apparent that the best 
bove the point halfway 


VI. Power Functions 
Let us suppose that, having standardized our te 
group, we wish to test the hypothesis that 
с-веоте of —0.5 or better. Suppose further tl 
cent critical region for testing this hypothesis 
that 50 per cent of the examinees whose c-s 
test scores of 27 or better on Test B, and 


st items on some basic 
а certain examinee has a true 
hat we wish to set up a 50-per- 
core is —0.5 will obtain actual 
50 per cent will obtain actual test 


· Referring to Table 4, we see · 


FREDERIC M. LORD 78 


scores of 26 or worse. We may, therefore, decide that we will administer Test 
B to the examinee and accept the hypothesis if he obtains a score of 27 or 
better, reject the hypothesis if he obtains a score of 26 or worse. 

The power function for this test of the hypothesis is the probability of 
rejecting the hypothesis, this probability being considered as a function of 
the true c-score. This function may be read off from Table 4 for selected 
values of c. It is given by the broken line in Figure 5. 


100 
ә 
3 

£ 80 
ч О 
о а. 
> 

>т 60 
=o 
at 
Qus 

a 40 
ог 
ax 
o 

Ф 20 
Ф 
œ 

о 

-2 -1 о | 
‚ True Ability Score (c) 


FIGURE 5 
est A (solid line) and for Test B (broken line) for Testing the 


Power Function for T 
e Examinee’s True Ability (c) Is Greater Than —0.5 


Hypothesis that th 


We may use the power fi unction to determine which of two tests measuring 
the same ability is better for the purpose of testing the stated hypothesis. 
The solid line in Figure 5 represents the power function of another test, which 
we will call Test A, composed of forty-nine 5-choice items each having R = 

l with Test B except that in Test A all 


44721 and h = 0. Test А is identica: є 1 
items are of 50-per-cent difficulty after correction for guessing. As shown by 


the figure, Test A gives a lower probability than Test B of rejecting the 
hypothesis when it is true, and a higher probability of rejecting it when it 
is false. It is thus seen that Test A provides the more powerful test for the 


stated hypothesis. 


Discussion 

d here is a measure of ability that will 
ctuations, no matter which of 
administered, provided always 


The “ability score" (6) develope 
remain unchanged, except for sampling flu 
various different tests of the same ability is 


74 PSYCHOMETRIKA 


that the basie assumption represented by equation (1) is fulfilled. "This score 
will have the usual valuable characteristics of a maximum-likelihood esti- 
mate.* In sufficiently large samples the éc-score will (1) approximate closely 
to the true value of c, (2) have an approximately normal sampling distribu- 
tion, (3) have a minimum sampling error in comparison with other comparable 
statistics that might be used for estimating с. Unfortunately, we are not, 
at present, able to describe the properties of é for small n, 
many items are required to constitute a “large” sample. 

It has further been shown that é is a sufficient statistic when the test 
is composed of equivalent items, i.e., that the c-score contains all the infor- 
mation contained by the examinee's responses relevant to the estimation of his 
ability. This property of the c-score holds for small as well as large samples. 

Suppose that а large number of free-response items all measuring the 
same ability have been “calibrated” on some large basic group of examinces— 
i.e., that the values of h; (difficulty index) and R; (item discriminating power) 
for each item have been determined for this group with a sufficient degree of 
accuracy. Suppose next that statistical tests disclose that the data are com- 
patible with the basic assumptions made here in relation to the item character- 
istic curves (some empirical evidence is available (13) to indicate that this 
supposition is not unreasonable, at least for certain types of test data). 
Finally, suppose that a variety of tests are built from various items selected 
from this calibrated pool, and that one such test is administered to each of a 
number of different examinees or groups of examinees. 

If the new groups tested are sufficiently similat to the basic group so 
that the item characteristic eurves remain unchanged and independent, we 
can theoretically obtain for each examinee from the formulas derived here а 
€score that will place all examinees on the same ability scale—i.e., on the 
scale on which the basic group (for which the items were calibrated) has 
а mean of zero and a standard deviation of 1. This may theoretically be done 
even though the different groups of examinees are at somewhat different, 
ability levels, and even though they take entirely different tests. Unfor- 
tunately, even for free-response items, the necessary computations w 
general be very onerous.[ An exception is the Speci 
15 composed of equivalent items. 


*This statement is not definitely known to hold for (ће é-score obtained i it 
although it clearly holds for the cases discussed in all other sections, Pp o n 
the other sections the item parameters are treated cither as known ~ su qoin 


nor to state how 


ould in 
al ease where each test 


50 that it seems reason: s 
mum-likelihood estimates to have the usu ne i ш 


al optimum properties "This conclusion i 
mum- Я s clusion is 
known to have been rigorously proved, however, nor, to the Writer's knowledi a p 
such a situation even been discussed in the literature. ar 


TDr. Ledyard В, Tucker has devised a least-squares method i i 
es] Lm Ri, pal Č ау п feasible for large m an с! vs a 
onsideration is being given to the possibi ity of using thi i ine 
аиа y g this method in the actual sealing 


T ЧЩ, 


FREDERIC M. LORD 75 


In the case of multiple-choice tests the application of the formulas in 
a practical situation is more problematical. Although in theory it is often 
/k (the probability of guessing the correct answer to an item) 
al of the number of responses per item, actual data 
his, and that it varies from item to 


assumed that 1 
is equal to the reciproc 
usually show that k is often less than t 


item. 
Regardless of these practical difficulties, there is value in the theoretical 


conclusions to which the present approach leads. Included in these con- 


clusions are the following: 
(1) In the case of a test composed of “equivalent” free-response items, 
the usual type of test score is a sufficient statistic for estimating the 


examinee's ability (c). 


(2) The standard error of 6 may be used as a measure of the discrimi- 


nating power of the test for examinees at а given level of ability. 

(3) Confidence intervals may be set up within which the examinee's true 
ability (c) may be assumed to lie. The length of such a confidence 
interval may be taken as а measure of the discriminating power of 
the test for examinees at а given level of test score. 

(4) A given examinee’s ability can be estimated more accurately by 
administering free-response items that are of 50-per-cent difficulty 
for examinees like him than by administering free-response items 
at any other single difficulty level. | | 

(5) In the case pf à test composed of “equivalent” multiple-choice 
items the usual type of test score corrected for guessing 1S а $t (fficient 
statistic for estimating the examinee's ability. | 

(6) Optimum measurement of a given examinee's ability level by means 
of “equivalent” multiple-choice items requires an item difficulty 
level somewhat easier than the halfway point between chance-success 
and 100-per-cent correct answers. 

as been given of the use of the Neyman-Pearson power 

hich of two tests measuring the same ability will be 

urpose of discriminating those examinees whose 
edetermined value from those examinees whose 


An illustration h 
function to determine W 
most powerful for the p 
c-score lies below some рг 
c-score lies above this value. v 

In closing, it may be noted that the problem of estimating the c-scores 

similar to the problem of determining latent 


of a group of examinees is very | mining. 

strueture (11, 5) Instead of estimating the shape of the distribution of 
, . > 

c-scores in the group tested, as would Lazarsfeld, we here estimate the c-score 


of each individual examinee. 

REFERENCES 
1. Brogden, H. Variation in test validity with variation in the distribution of item 
difficulties, number of items, and degree of their intercorrelation. Psychometrika, 


1946, 11, 197-214. 


76 PSYCHOMETRIKA 


2. Cronbach, L. J., and Warrington, W. G. Efficiency of multiple-choice tests as a 
function of spread of item difficulties. Psychometrika, 1952, 17, 127-147. 

3. Ferguson, G. A. Item selection by the constant process. Psychometrika, 1942, 7, 
19-29. 

4. Finney, D. J. Probit analysis. Cambridge: Cambridge Univ. Press, 1947. 

5. Green, B. F. Latent class analysis: A general solution and an empirical evaluation. 
Ph.D. thesis, Princeton University, 1951. 

6. Guilford, J. P. Psychometric methods. New York: McGraw-Hill, 1936. 

7. Gulliksen, H. The relation of item difficulty and inte 
and reliability. Psychometrika, 1945, 10, 79-91. 

8. Gulliksen, H. Theory of mental tests. New York: John Wiley and Sons, 1950. 

9. Lawley, D. N. On problems connected with item se 
Proc. roy. Soc. Edin., 1943, 61-A, Part 3, 273-287. 

10. Lawley, D. N. The factorial analysis of multiple item tests. Proc. roy. Soc. Edin., 
1944, 62-A, Part I, 74-82. 

11. Lazarsfeld, P. F. (with S. A. Stouffer el а.). Ме 
of studies in social psychology in World War II. 
1950, Chs. 10 and 1l. 

12. Long, J. A., and Sandiford, P. The validation of test items. Bulle 
partment of Educational Research, University of Toronto, 1935. 

13. Lord, F. M. A theory of test scores. Psychometric Monograph No. 7, 1952. 

14. Lord, Е. M. The relation of the reliability of multiple-choice tests to the distribution 
of item difficulties, Psychometrika, 1952, 17, 181-194, 

15. Lorr, M. Interrelationships of number-correct, 
test. Psychometrika, 1944, 9, 17-30. 

16. Mood, A. M. Introduction to the theory of statistics. McGraw-Hill, 1950. 

17. Mosier, C. I. Psychophysics and mental test theory: fundamental postulates and 
elementary theorems. Psychol. Rev., 1940, 47, 355-366. , 

18. Mosier, C. I. Psychophysies and mental test theory. II. 'The constant process. 
Psychol. Rev., 1941, 48, 235-249. 

Richardson, M. W. Relation between the difficulty and the differential validity of 

a test. Ph.D. thesis, University of Chicago, 1936.' Also Psychometrika, 1936, 1 

(No. 2), 33-49. 

20. Symonds, P. M. Choice of items for a test on the basis of difficulty, J. educ. Psychol., 

' 1928, 19, 73-87. 

21. Thorndike, R. L. Personnel selection. New York: John Wiley and Sons, 1949, pp. 
228-230. 

22. Thurstone, T. The difficulty of a test and its diagnostic value. J. educ, Psychol., 
1932, 23, 335-43. 

23. Tucker, L. R. Maximum validity of a test with equivalent items. Psychometrika, 
1946, 11, 1-13. . 

24. Tucker, L. R. A method for scaling ability test, items in difficulty 
reliability into account. Amer. Psychologist, 1948, 3, 309-10. 

25. Wilks, S. S. Mathematical statistics. Princeton: Princeton 


r-item correlation to test variance 


lection and test construction. 


asurement and predietion, Vol. 4 
Princeton: Princeton Univ. Press, 


tin No. 3, De- 


and limen scores for an amount limit 


taking item un- 
(Abstract) 
Univ. Press, 1944, 


Manuscript received 2/26/52 


Revised manuscript received 8/29/52 


_ _———— —_ _—_——-—<—-—-—- 


и» 


- factor configuration suggested that if t 


PSYCHOMETRIKA—VOL. 18, NO. 1 
MARCH, 1953 


A REVISED ORTHOGONAL ROTATIONAL SOLUTION FOR 
THURSTONE’S ORIGINAL PRIMARY MENTAL ABILITIES 
TEST BATTERY 


Wayne Ө. ZIMMERMAN 


BRANDEIS UNIVERSITY 


By extension of the rotational process, meaningful orthogonally related 
positions were found for all of the thirteen centroid factors which Thurstone 
extracted from his original PMA intercorrelations. Most of the original 
primary ability factors were more sharply delineated and corresponded more 
closely to the Army ‘Air Force factors that bear similar names (demonstrating 
greater invariance from analysis to analysis). While such different results 
obtained by two investigators applying the same methods on the same data 
may initiate some concern, the results strengthen rather than weaken the idea 
that more psychological meaningfulness and greater invariance will result if 
centroid axes are rotated, using the concepts of a simple structure and positive 


manifold. 
Introduction 


In naming the rotated factors in his early classical fifty-seven variable 
analysis, Thurstone (1) exercised caution. Of thirteen factors extracted, he 
identified only seven With assurance and two others tentatively. He did not 
try to rationalize the test loadings on axes 10, 11, 12, and 13. 

Two of the last four columns apparently were rotated toward a simple 
configuration, since both axis 10 and axis 12 meet tests of simple structure 
and positive manifold as well as, or better than, at least one of the factors 
that was identified. Residual axis 10, with a minimum value of —.077, 
contained thirty-four entries within the range of = .20, which Thurstone 
described as nearly vanishing, plus five above 40, while residual axis 12 with 
a minimum value of —.10 contained forty entries within + .20 and five 


above .40.* А , 
One reason why these rotated factors Were not named is that in each 
instance there are grouped significantly loaded tests with no apparent func- 


tional unity. 

The abundance of variance left on these residual factors furnished one 
stimulus for seeking a revised rotational solution. Also, one of the residuals 
promised upon further rotation to represent à spatial-visualization factor— 
in addition to primary factor S (Spatial) (2, 270). Inspection of the over-all 
he rotational process were continued, 


*Thurstone considered loadings between +.20 as negligible and considered only 
loadings of at least .40 in naming factors. 


77 


78 PSYCHOMETRIKA 


the reallocation of variance among the primaries and the residuals should 
yield factors that would parallel more closely similar factors in Army Air 
Force data (3). Thus it appeared that greater invariance could be demon- 
strated and that the demands of simple structure and of positive manifold 
would be met more effectively. 


Method. 


Further rotations were planned using the final published values as a 
starting point. Values were plotted and the axes were then rotated orthog- 
onally using a simplified graphical method (4). Eighty-six additional 
pair-by-pair rotational adjustments were required. 


Results 


On the following pages the rotated factors described by "Thurstone are 
compared with the factors after the additional rotational adjustments. 
Thurstone’s rotated factorial matrix (1, 115, 116) is not reproduced here, 
although loadings pertinent to comparison with the new results are listed for 
each factor. The new rotated factorial matrix is presented in Table 1. 

All but one of the seven original factors that were identified with assur- 
ance are directly comparable. Nevertheless, there are some significant and 
interesting differences between the two solutions. As a result of the new 
rotations, two of the factors previously identified at least tentatively were 
changed enough to warrant re-evaluation. Four other factors were added, 
since all four of the residuals were rotated to meaningful positions.* 


Definition of the Rotated Factors 
and Comparison of Results in the Two Solutions 


Spatial Relations. ‘Thurstone’s first rotated factor was his factor 8. 
It contained thirty-six projections in the nearly vanishing range of + .20 
and the lowest value was —.092. After the new rotations there were thirty- 
seven nearly vanishing entries, the lowest being —.14. The thirteen tests 
with loadings of .40 or more in at least one rotational solution follow. t 

In the revised solution, Flags was more clearly the best representative, 
Hands and Block Counting also gained significantly; while Pursuit and 
especially Form Board, Figure Classification, Verbal Classification, and 
Sound Grouping lost significant weight. Variance on the Form Board tet 
transferred to a new Visualization factor; variance on Figure Classification 

* em Я 4 й : 
at eno ain and db ee д шл nl а венн 
six of the group factors and one of the doublets to the factors Thurstone identified. Tha 
second doublet, "Rhythm," can be directly related to the eT 


s T “Classification” fact, descri 

in this report. The two highest loadings oi thei n NER а. 

only .27 and .17, respectively (5), и т Ерот Lanta, “Analogies,” Ee 
TFor each factor, the tests are listed in the order of their loadings i 


tions. n the revised solu- 


WAYNE S. ZIMMERMAN 79 


SPATIAL RELATIONS 


Thurstone’s Revised 


Name of Test Solution Solution 

(20) Flags .636 .727 
(22) Lozenges B .633 .604 
(18) Cubes .626 .592 
(53) Hands .455 .547 
(17) Block Counting .418 .524 
(27) Pursuit. .584 „518 
(23) Surface Development ‚551 ‚500 
(19) Lozenges А .448 .400 
(45) Syllogisms .430 .398 
(21) Form Board .415 .817 
( 8) Figure Classification .393 222 
( 6) Verbal Classification 411 .211 

.412 .211 


(55) Sound Grouping 


to the Perceptual Speed factor; and variance on Sound Grouping, to another 
Thus the readjustments tended to purify the factor, 


reasoning factor. 
cture and psychological meaning- 


effecting a significant improvement in stru 
fulness.* 

Perceptual Speed. 
Speed factor. In his rotati 
with a minimum value of 
appeared in the + .20 range a 


Axis II, after rotation, defined Thurstone’s Perceptual 
ons thirty-seven nearly vanishing entries resulted, 
—.072. In the new rotations, forty-two tests 
nd the minimum value was —.128. 


PERCEPTUAL SPEED 


Thurstone’s Revised 


Name of Test Solution Solution 

(26) Identical Forms А .603 
( 6) Verbal Classification .587 .581 
(59) Word Count p Ys 
( 7) Word Grouping 57 d 7 

(51) Picture Recall .545 .841 
(11) Completion .422 :311 
(14) Disarranged Sentences vs .300 
(60) Vocabulary a an с 
(44) Pattern Analogie: X an 


(41) Verbal Analogies 


engthened the loadings of Identical Forms 


The revised rotations str ] 
dari on the Perceptual Speed 


primarily and Verbal Classifications secon 
*See description of Visualization factor and final paragraph under Eduction of 
Relationships factor. 


PSYCHOMETRIKA 


80 


"gq 199 тс 898 28° 267 10’ 898 6817 900 — 622°  980':— һо 1 0 SutKdoj) '82 

725 esc’ 220 LEI’ 691 FE 010'— 2868 828 100`— 911 — 880° — 091 тг ele qmsmq 724 

‘9g 299° 80 167 Get’ 911 100°  сРро'— 680 660 — FSI’ 261  GF0 — 857 вет’ SUMO [VOYUOPT “9% 
706 G69" те0' — 968° Ir 106° 88° ere” eir — ПК 980  98T^ 980'— 691:  26l' SjuouigAoIN 

[voruByooyy ‘65 

рс 182° 980— 219° 096° 820° 965“ бт’ £06;  r90'— 987 OFT” сео, 00° 99 5910 peqpounq “FZ 
'£G 969 807 #66 120° 95’ GIG 180 Fe 190'— 687 010'— 020 GFT 00g" quouido[oAo(T 

oowjng EZ 

‘ZZ LRL 9107 266 607 280 — FIG;  *80' 268 090° 085° 200'-; 260' 291 +109’ Я so3uozoT ‘gZ 

preog wog “TS 


16° 867 219 968° 816 210 8817 gor’ GOT’ 867 ZIT’  990'— 690° 2те" 
'05 $18° 280'— 998° 616’ 861 — FFO'— FeO” 981 180 9/0 #807 IGE’ 860° 252' soupy ‘05 
‘6T 9827 TOT 28€ пе #Р1'— GST’ 90° 865°  9c0'- #00 018° 290 080° 00Р` ү soSuozoT '6T 
"вт тел 140 865 696° 860 мо ға бё 815° 600° —9c0 — 880 888° 686" вәд) ‘ST 
Л 629 907 008 68 TGI’ 260'— 905°  00L'— 221 — 000 120° 880° 980 Fes’ Surjunoj) удо '21 
FIO’ S20 288° 16 82 РЕГ 168  6F0'— suXuou&g oAnuoAu[ '91 


"Ie 0 


‘QT TIL’ SeT 090'— ggg #9 998" 

“et 326 800° 610'— SIT’ 068" 0017 90° 821 cce S0z $10 — FZI’ FO" — 000 зиму ‘от 

FI cg! 020 OBI’ — 66 020° аР  I00'— Scc; 687 808 пс 980 008 EFS soouojuog 
родива та TI 


9° 9Р0°— 2807 666 6I +86 880: — 801 Sr 260 241’ SOT” 820 ТІ" 304497 ISU рив jsp “ET 


"el 

‘er Toe 060° OFS еа SST’ 980° 060' 210 GIS’ 90% GES 907 9917 8Р0 5рлоду родити “ZT 

"IL 988° 0617 981 886 GE StO'— FET” 820'— сс 296 19’ 160 — Пе TET uonopdum;) ‘И 

‘or ма ese 801°— 2487 OFF OTT’ €80' ccl' 155’ 515’ 69° 180 $00 — LET’  Sejsodd()oAnuoeAu ‘от 

'6 029° Ser 661 — 190° 990 280° 220'— 281 200 — SII’ 266 910° 880 Аг uormioossy 
рәцолупогу '6 


010°— Iel'— 900'— 9567 GGG  —uonvogisse[) әй] '8 
720" Зшаполо ріод 7/4 
uorvopmsv[) [UqIoA ‘9 

II Sujpuoq ‘$ 


T8 110 — OFT’ МР’ 000 927 бс eer 887 
'L AL. 0198  100:— ФР 887 0217 280° Oel 980 2907 82 960 928 
`9 сос 800° 9168 3877 260° 2927 228° $80 #97 ее 100 188 Пё 

сос 5517 117 org’ 610°— 558° 90! FIO 0817 — 220— 


'e 688 (0) ат 6б а 
т 19 900° 900° 90° 66 9677 бга 808°  100'— cec 869 100 — eS — S60 — pSupeoy p 
WM “А ч ЛА а мо 9 ul Hy A N a s 89] Jo ошту 


шх UX IX X XI IHA ПА IA A AI ш п 1 


ХНА [VOJIRI 1110904310) pojujoor 
(uornjog рәзтлә) sisAqeuy sonyiqy [чә Алия 


І WISYIL 


709 ?E0'I 240) 860° 5/5  091' 200`— 602" 10° 988° 18° 929°  960'— T6z'  600'— Алејпаеоод ‘09 
"69 ФЕР PZE FOO $60"— 960° Z8 SO 010° 370`— 880`— 690 ggz О Р OFT — FUNOD poA 6с 
= 89 96 917 808°  260'— I9$'  680'— 920: — SIT" 9807 490° £97 80° OFT’ _ Р60'— (одвотцо)) 
© Алепавоод сс 
"29 692° 8017 #60'— TOI’ 091" Iz 85° TOS 819 60 057° 080° 150`— FAL пошти) g 
798 249 268° 900°— 860'— 872"  T0z' 682’ 890° 60Р 910°— eer 870° 290° 690 Suyjedg “gg 
"99 808° 881 9£0'— ос" 010° 00° 190° egg 028 9007 008° тег 861 те Зи полку punog ‘gg 
18 6L GIU 80°  960'— 282^  090'— 860° gie 105° 610 ЕТ 050°— с10'— 280' UPAH ye 
"69 0297 ТОТ’ 907° 861° 001'— 680'— 280° 98r 8617 028 291 — 88U 91'— де spuvyy “eg 
'6© 602 тер’ 807 FEI" 006° 181" 951° 182' 390° OFT" ЧЄ 260° OTT? 881'– our ‘59 
‘TS 89 1697 10° FST" 010°  $90'— 600° %60°— 560 297 $00 1/0 Tre’ 910° [[#дә}[ oinjorq ‘те 
.09 20667 SFT’ 0/5 816:  180'— по 800° лет +0 FIY 8917  00L'— 280'— 8/0`—  uoniusoooy ainsi ·06 
‘OF 88р 966 08° 200° 020° с eI0— GU 2g 9887 ТӨГ 080'— 610° 220'—  uoniusoooyg ром ‘69 
"ВР 904° ZIT’ Т10'— 620'— 020'— FOI’ 108° Ost’ 110°— 604 230`— 861 ер Zio- пост М-да ‘8 
z Ш 800 80° 940° 280° 1/8° gee 9/0°— e6l' SAU 80 Sic 8/0 020° g60" — spen "4p 
5 oF Ger 980 0 $90 — OFZ 26 Рао 20° 690° SIY 6507 200 бт 6 FO" JoqumN PIOM '9p 
2 Рош Лт’ ISI 981° FES" сер 1867 6б 980° ST" 955° 106  SIG' 868’ SUISITOI[AG “Gp 
3 ‘th 018 TOC Ме 908 OPI +81" ИР 808 LT" 020' 85° 060'— 15° 807 soBopwuy uioyyuq 'РР 
Я ЕР seg’ 190 206: ОР 220'— тог 855° CcE&'  YIO'— 198° gege’ 066 uz seg SPIOM гроб ‘ер 
N СЕР 462 90° 240'— 960° 080° 09° 06 SFI’ 666 005°  162' 00°— 681° ост’ Sesmreiq өзгө “SP 
à СЇР US' 681°  8I0— 8698 161` SU 690° 0867 SFO" — Ger. бор OFT" 11° ggg Sodoppuy тел "Tj 
EB ОР 858° 09£0'— арг BT" 600:— 809° @ 961 ле мг СӘР 170°  Gec0'— 890' Sumosvoq ‘ор 
а 768 1985 866 ZEI ТОГ = OT" SU 69° ZEO'— #00'— 696 GGL" 086 0Р0°— FOS Ящиоѕтәц 
= : punounpuy 'бе 
"o CS $00 0% ес 880 — 110` 200: — +09" 1/0" соо ео’ cor сте 060 ggo’ зари 
e а “Ss 
26 TEL АТ  I£0— FZ 080° 968° Јер 968" SI0 — FGI” 220 195° 10° 990° SOMOS 10qumN “LE 
‘98 O9F TU 880°— 80 | 281'— SIZ 821 20L-— 690'— 110 616 210`— 980°— ge Sunvums; “9g 
098 089° 820 — TIT’ SPI 9807 SU 16° #96" 980 #93" 9902 20" 220° ec uonopdurop) emque, ‘98 
ТЕ 602 8Р0 060° 080`— 680° 10° 86+ бт 881 610 FIT’ ёс т бл UOISIANT "pg 
‘E€ 808' 0667 09017 190'— 9IL' #60'— gzz zeo оте 6807 9€90'— 692° ZIO Zor чодвоца пу ‘eg 
066 602 108" 200 00° 8/0°  S9U' STO 15 б 610 — 81 689° ScU-— LOT uonounqng '22 
18 980 160 0£0'— TIT’ 60° 287 80° 1/0° 60 8207 080'— $94! 680° ег чотирру “TE 
`08 688° FLT’  ceU сор 290 881" вет" 000° 220° S86 2/07 619° 80° pe эро 1equmN 'og 
'66 269 6600 095° 905° ЄП` тог #68 29° O 1667 880° cc0' 8/0`— 802' Story ‘65 
zt чи *A a TA a чо 9) AT и A N d БУ 389, JO ouruN 
шх их XI X XI ША IA 1А А AI ПІ II I 


(ponuuop) т STHY.L 


82 PSYCHOMETRIKA 


factor. Word Count also gained, but most other tests lost weight in the 
exchange. Seven tests with loadings ranging from .417 to .573 dropped 
out of the significant class of .400 and above. Word Grouping, Pattern 
Analogies, and Verbal Analogies shifted weight to a new reasoning factor. 
Vocabulary and Completion contributed much of their former perceptual 
variance to the Verbal factor. Picture Recall and Disarranged Sentences 
gave up perceptual variance to a new factor (Memory for Observed Relation- 
ships) on which they appeared together. 

In describing the factor in which Identical Forms has the greatest 
saturation, Thurstone wrote, 


A hypothesis which agrees with introspective study of the mental operation essential in 
these tests is that the factor is савет Пу perceptual in character. Strietly speaking, all 
the tests in the battery involve perc eption and vision in particular... The perceptual 
function here seems to be a facility in perceiving detail that is imbedd 
material (1, 80-81). 


ed in irrelevant 


The perceptual element in Picture Recall, Word Grouping, Disarranged 
Sentences, and in the Thorndike Vocabulary test was rationalized originally 
as due to a low level of item difficulty, emphasizing the speed element. In 
the new rotations, the same explanation scems less strained in view of the 
diminished loadings. 

Numerical. Axis III yielded another of Thurstone’s primary abilities. 
The original rotated loadings contained thirty-eight within the range of 
=.20, with a minimum of —14. In the new solution, forty-five loadings 
ranged between --.20, the lowest being —.10. 


NUMERICAL 

Thurstone’s Revised 

Name of Test Solution Solution 
(33) Multiplication .812 .769 
(31) Addition .755 .764 
(32) Subtraction .670 .659 
(30) Number Code .625 .619 
(34) Division .619 .584 
(35) Tabular completion .392 .402 
(38) Numerical judgment. .432 .345 


Тће structure of the Numerical factor remained 
Apparently the simple functional operations of ad 
subtraction, and division are basic to this ability, 
addition representing it best. 

Verbal. Rotated Factor IV is Thurstone's Verbal Relations, Thirty 
loadings fell within the +.20 range, the strongest negative bein; — 065. 
The new rotations yielded thirty-two "negligible" values and а и 


essentially unchanged. 
dition, multiplication, 
with multiplication and 


WAYNE S. ZIMMERMAN 83 


of —.137. Although in both rotational solutions the axis was projected 
through the large cluster of verbal tests, its direction differed significantly. 


VERBAL 
Thurstone’s Revised 
Name of Test Solution Solution 
(58) Vocabulary (Chicago) .395 .763 
( 5) Reading II .506 . 706 


(60) Vocabulary .385 .676 


( 4) Reading I .552 .638 
(10) Inventive Opposites .639 1549 
(11) Completion ‚333 .541 
(16) Inventive Synonyms .495 .478 
( 7) Word Grouping .456 .478 
(40) Reasoning .420 .465 
(41) Verbal Analogies .597 .459 
(52) Theme .357 .435 
(50) Spelling .386 .438 
(57) Grammar .498 .420 
(42) False Premises .424 .391 
(55) Sound Grouping .453 .300 
( 9) Controlled Association .450 222 

.395 .211 


(14) Disarranged Sentences 


est in Thurstone's solution, while the new 
location of the axis plaves four other tests, Vocabulary (Chicago), Reading 
II, Vocabulary, and Reading I at the top of the list. The first three and 
also Completion gained considerable weight at the expense of the Restrictive 
Reasoning factor. The two reading tests picked up significant weight from 
Residual Factor 12. Vocabulary and Completion drew also from Perceptual 
Speed, while Inventive Opposites, Verbal Analogies, Sound Grouping, Con- 


trolled Association, and Disarranged Sentences transmitted weight to new 
factors. The major portion of the transferred variance of Inventive Opposites 
t to a new verbal fluency factor; of Verbal 


and Controlled Association wen 
Analogies to a new relationship factor; of Sound Grouping to a new classi- 
fication factor; and of Disarranged Sentences to a new memory factor. 


In his discussion of the V erbal Relations factor, Thurstone wrote, "In 
all these tests the subject must deal with ideas, and the factor is evidently 
characterized primarily by its reference to ideas and the meaning of words." 
(1, 84). With the vocabulary tests now heading the list, even greater stress 
is placed upon the knowledge of meanings of words. At the same time the 
factor is more clean-cut and univocal. 

Rote Memory. Factor M (Memory) originally had thirty-seven entries 
in the -.20 range and a minimum of —.080. After the additional rotations, 


thirty-five values were between 4.20, with a minimum of —.116. 


Inventive Opposites is high 


84 PSYCHOMETRIKA 
MEMORY 
Thurstone’s Revised 
Name of Test Solution Solution 
(48) Number-Number . 664 .709 
(47) Initials .487 .528 
(46) Word-Number .529 .518 
(50) Figure Recognition .420 .514 


In the second solution Number-Number continued to represent the factor 
best, extending its projection from .664 to .709. Initials and Figure Recog- 
nition also gained as a result of the new rotations. All four of the leading 
tests were designed as memory tests, and the first three are in paired-associ- 
ates form. Whatever the emphasis, “There seems to be no doubt that this 
factor is concerned with memory." (1, 85). Of three memory factors listed 
by the aviation psychologists, one identified as “Paired-Associates Memory” 
compares closely with this factor (3, 823, 826). 

Letter Fluency. Thurstone’s rotations for factor W, like the new rota- 
tions, yielded thirty-nine values within the range of +.20, with а minimum 
of —.18. 

LETTER FLUENCY 


Thurstone’s Revised 


Name of Test Solution Solution 
(15) Anagrams . 584 .552 
(12) Disarranged Words .512 .519 
(57) Grammar .530 .518 
(56) Spelling .508 .463 
(13) First and Last Letter .388 .448 
(60) Vocabulary .413 .386 


Anagrams, Disarranged Words, Grammar, and Spelling continued with 
the highest loadings. Projections for Spelling were shortened and for First 
and Last Letter lengthened somewhat, but the differences were slight. 

All of the tests deal with verbal materials, but stress is placed on famili- 
arity with word structure rather than their meanings. Thurstone wrote, 
“The factor W seems to have as its principal characteristic a fluency in 
dealing with words. This factor seems to be separate from the verbal factor 
V, which is concerned with ideas and meanings." (3, 85). 

'The term *Word Fluency" now seems unfortunate insofar a: 
general verbal fluency, or ease of evoking words appropriate in meaning. 
A new factor, revealed in the revised solution, fits this description better. 
“Fluency with letters" seems more descriptive of factor W, implying easy 
recognition and manipulation of letter patterns. Apparently some individuals 
are able to recall words fluently or to construct them readily without neces- 
sarily understanding their meanings very well. 

Classification. The second of the two factors 
tentatively he denoted I (Induction). 


5 it implies 


that Thurstone identified 
There were thirty-six tests within 


WAYNE S. ZIMMERMAN 85 


the range of 4-.20, and the minimum value was —.110. In the new rotations 
thirty-three projections remained within the diminishing range,: and the 
greatest negative projection was —.088; but the complexion of the factor 


changed altogether. 


CLASSIFICATION 

Thurstone’s Revised 

Name of Test Solution Solution 
(55) Sound Grouping .285 . 635 
(54) Rhythm .319 .573 
( 8) Figure Classification .405 .455 
(45) Syllogisms .325 .429 
(37) Number Series .503 .396 
(35) Tabular Completion .479 .263 
(29) Areas AT .262 

Originally, Number Series, Tabular Completion, and Areas were most 


heavily weighted. In the revised solution most of the variance of these 
tests transferred to the factor called General Reasoning. Sound Grouping 
and Rhythm picked up substantial weight, mostly from the Spatial and 
Verbal Relations factors. Figure Classification and Syllogisms showed lesser 

{ the axis was changed so significantly that test 


gains. The final position 0! 
loadings in the two solutions are not directly comparable. 
Since the two leading tests in the revised solution represent an attempt 


to introduce material of auditory significance in paper-pencil form, the 
factor might be interpreted as an ability to respond to auditory stimuli. 


Such a conclusion would not explain the significant loadings of Figure Classi- 
The tests do seem to have one common 


fication and Syllogisms, however. ; 
feature, since in each à dichotomous classification has to be made rapidly. 
Tn Sound Grouping, groups of four words are ed E alike, 
ng, > gt естіді гееп the two kinds 
ar ‘ent, The task is to discrimina ОДЕ ИКО ав. 
nd one sounds different three alike in rhythm 


. : verses 
of words. In Rhythm, each item includes four verses, tree y 
and one different. The examinee selects the one that is different. Figure 


Olá caton риев similar problem involving d Creed on 
medium. Each item consists of eight unclassified ork zd c 
of four related symbols. The task is to classify each of the symbols according 


е 3 t has a fairly significant 
to the enraumtel. ue БУНЕ mS ue пас 
he dichotomy represente Tt is suggested that solving these 


loadin cna difficult to explain. Е 5 
мане РЕЈ by an ability to M odd ee 
dichotomously—e.g., considering 2 pair of indiv ae ae а а 
he tentative label "Classification" has 
nterest here to call attention to à 
d Harman which they named 
both with loadings of .53, were 


One as younger and one 25 older. The te 
bcen applied to this new factor. It is of 1 
doublet factor isolated by Holzmeet an 
“Rhythm.” Rhythm and Sound Grouping; 
the only significantly weighted tests (4). 


86 | PSYCHOMETRIKA | 


General Reasoning. Thurstone’s factor № (Restrictive Reasoning) had ) 
twenty-eight column entries in the range of +.20 with a minimum of —.101. 
In the new solution, the factor with which R corresponded most closely 


contained thirty-three diminishing values and a maximum negative projec- 
tion of —.088. 


GENERAL REASONING 


Thurstone’s Revised 


Name of Test Solution Solution 
(39) Arithmetical Reasoning .583 .642 
(38) Numerical Judgment .534 . 604 
(29) Areas .295 .523 
(34) Division .352 .498 
(35) Tabular Completion .180 .491 А 
(37) Number Series .091 .437 RG, 
(44) Pattern Analogies ‚341 411 
(60) Vocabulary .545 .350 
(25) Mechanical Movements 414 .343 
(56) Spelling .410 .239 
(11) Completion .481 .154 
(58) Vocabulary (Chicago) .457 —.026 
With their projections extended significantly, Arithmetie Reasoning and 
Numerical Judgment head the new list. Other very significant changes 
occurred. Areas, Division, Tabular Completion, and Number Series made 
striking gains, picking up variance from Thurstone's Factor I. On the other 
hand, the verbal tests, Vocabulary (Thorndike), Vocabulary (Chicago) 


Completion, and Spelling transferred large portions of their variance to the 
Verbal factor. 
Thurstone denoted this factor R, for restriction. “Тһе common charac- 
teristic seems to be the successful completion of a task that involves some 
form of restriction in the solution.” (3, 88). Classifying the factor presented 
difficulties. “Тһе characteristic that is common to these tests, and to a lesser : 
degree in the tests with projections between .30 and .40, is not easy to de- 
termine." (3, 88). It now seems quite possible that much of the difficulty 
may have been that reasoning and verbal tests were clustered together, 
The new rotations make a clean-cut interpretation possible. Since 
the verbal element transferred to the verbal factor, the reasoning tests are 
left alone to represent the factor which now corres 
the Army Air Force's "General Reasoning." 
cluster are of a numerical nature, but their 
accounted for already in the Numerical factor. Pattern Analogies contains 
non-numerical problems and Areas also is largely non-numerical, | 
Ed A apa! that Thurstone only tentatively iden- | 
tn лаев 2. А ур uction. Column D had thirty-three entries | 
à | ===, апа а minimum of —.097. In the second set of 
rotations eight additional tests entered the diminishing range, and the 


ponds quite directly to 
Most tests remaining in the 
pure numerieal variance is 


WAYNE 5. ZIMMERMAN 87 


minimum was —.093. Again, the loadings of the identifying tests tended 


to increase while those of tests with lower loadings diminished, thus im- 


proving the structure. 


DEDUCTION 
Thurstone’s Revised 
Name of Test. Solution Solution 
(42) False Premises .578 .629 
‚525 .608 


(40) Reasoning 

(37) Number Series .287 .396 
(25) Mechanic: Movements -403 .328 
( 8) Figure iion .398 .226 


The values for False Premises, Reasoning, and Number Series inereased 
riance from scattered sources. Both Mechanical 
Movements and Figure Classification transferred variance to the new Memory 
for Observed Relationships factor, while the former also lent variance to 
Visualization. Although “Its obvious common feature is the deductive 
nature of the four tests” (1, 88), Thurstone lacked confidence in the interpre- 
tation of Factor D because of the small number of significant loadings. The 
new rotations did not help in this latter regard, since they tended to isolate 
further False Premises and Reasoning, giving more of a doublet appearance 
to the factor. Since both of these tests consist of syllogistie reasoning 
problems, the new solution supports the identification of the factor as de-* 


significantly, drawing va 


E] 


duction. 


Verbal Fluency. Column 10 in the original rotated matrix was said to 


represent à residual factor, even though it contained thirty-five values 


within the +.20 range with a minimum value of — .077, and five values above 
40, the maximum being 501. The new rotations added three diminishing; 


values, with a minimum value of —.137. 


VERBAL FLUENCY 
Thurstone’s Revised 


Name of Test Solution Solution 
( 9) Controlled Association p 08 
(58) Vocabulary (Chicago) pe 4: 
(10) Inventive Opposites po 
(11) Completion go E 
(13) First and Last Letter г di 
(15) Anagrams r^ n 
(21) Form Board p RR 


(25) Mechanical Movements 

ssible to apply а meaningful interpre- 
its projection from .480 to .666, 
d variance from the Verbal 
abulary test and Inventive 


he new set of rotations made it po 
tation. Controlled Association extended 
borrowing the larger portion of this increase 
Relations factor. The values for the Chieago Voc 


88 PSYCHOMETRIKA 


Opposites also increased significantly. The two tests that undoubtedly 
caused difficulty in the identification of the original factor, Form Board and 
` Mechanical Movements, are dropped from the significant class, with the 
major portion of their variance reallocated to the new Visualization factor. 
Тће task in Controlled Association is to write as many words as possible 
that are similar in meaning to a given stimulus word that has a highly general 
meaning allowing for a multiplicity of responses. The comparatively high 
loading of the Chicago Vocabulary test may be due to the fact that the 
stimulus word is presented in a short phrase, furnishing cues to its meaning 
from the subject matter. Fluent recall of trial words to fit the phrase should 
aid the subject to elicit correct responses. In Inventive Opposites the task 
is to supply two antonyms for a given test word. The Completion test 
emphasizes fluent verbal recall as opposed to the verbal recognition tapped 
by the five-choice Vocabulary tests. In First and Last Letters the subject 
must write as many words as he can beginning with one given letter and 
ending with another, while Anagrams requires the examinee to make as 
many words as possible, using only the letters of a given word. The heavily 
saturated tests now remaining apparently contain a single common element— 
fluency in dealing with words and sentences (in contrast to the original 
fluency factor, which has here been reinterpreted as fluency in dealing with 
letters). It is interesting to note that the last two tests named have more 
significant loadings on the Letter Fluency factor than they have on Verbal 
Fluency. 

Eduction of Relationships. Axis 11 was not rotated to positive mani- 
fold by Thurstone, although it apparently was used in some of the rotations, 
since there was more positive than negative weight represented. Thirty- 
three entries were within the +.20 range, negative values as high as —.39 
were present, and three values exceeded .40. The new rotations yielded 
thirty-one loadings in the near-zero range with a minimum of —.124, and 
extended eight projections beyond .40. 


EDUCTION OF RELATIONSHIPS 


Thurstone’s Revised 
Name of Test Solution 


Solution 
(41) Verbal Analogies .913 .592 
(44) Pattern Analogies .217 .526 
( 8) Figure Classification .269 .447 
( 7) Word Grouping -087 .446 
(28) Copying .248 437 
(43) Code Words .326 у 416 
(25) Mechanical Movements .286 ` 41 
(30) Number Code ads b 
(21) Form Board 281 505 
(20) Flags 434 cod 
(53) Hands .467 .198 


WAYNE S. ZIMMERMAN 89 


Originally, Hands, Number Code, and Flags, were the only tests with 
significant loadings. While one of these, Number Code, retained its position 
in the above-forty class, Flags and Hands had their loadings reduced to a 
near negligible level, with most of their variance transferring to the Spatial 
factor. In the meantime seven tests passed Number Code, gaining weight 
essentially as follows: Pattern Analogies and Word Grouping, from the 
Perceptual Speed factor; Verbal Analogies, from the Perceptual Speed and 
Verbal Relations factors; Mechanical Movements, from the Residual factors 
10 and 12; and Code Words and Copying, from scattered sources. 

This factor apparently transcends the nature of the subject matter, 
with two verbal and two non-verbal tests in the leading roles. Verbal 
Analogies and Pattern Analogies have a similar format. Both tests require 
the subject first to determine the relationship between two stimuli, then 
to make а choice of a stimulus that has the same relationship to a third 
given stimulus. The first test uses words and the latter geometric figures. 
Word Grouping and Figure Classification also test an ability to see relation- 
ships among given stimuli. Again, one involves words and the other geometric 
The significant loading of Copying is of interest because, super- 
pears to be entirely different from the others in the group; 
yet, to solve the problems it is necessary to note relationships between a 
given pattern and one to be copied. Code Words requires the determination 
of English equivalents for words written in code. Problems are solved by · 
noting relationships among symbols that correspond to the relationships 
among letters in words. Mechanical Movements is factorially complex, 
but the ability to note recurring relationships, such as the fact that two 
helical gears in mesh turn in opposite directions, is undoubtedly helpful. 
Number Code items require an examinee to perform calculations using a 
combination of Arabic numbers and code numbers. The relationships between 
the two types of *numbers have to be determined before the problems can be 
solved. In Form Board the problem is to discover the relationship between 
a group of small geometric figures and a larger figure that the small ones can 
be и that the ability to educe relationships is at the core 
of this factor. On few other factors has such а heterogeneous group of tests 
been held together by a single common tie. It is pertinent to note at this 
point that the first series of rotations leading to the new solution tended to 
isolate Flags and Hands on Axis 11, thus corroborating the E evidence of 
the existence of AAF Space 2 (3, 885, 839), and Thurstone’s Factor K, 
represented by Hands and Bolts (8). It became apparent, however, that 
continued rotations in this direction would not permit some large negative 
values to reduce satisfactorily. As a trial venture the poles of the axis were 
reversed, and a much more satisfactory solution was achieved. ve con- 
figuration suggested strongly, however, that the extraction of another centroid 


figures. 
ficially, the test ap 


90 PSYCHOMETRIKA 


might well allow this second space factor to reveal itself and that much of 
this variance would be drawn from the present Space factor. Thus, the 
Space factor described in this article should probably be regarded, at least 
provisionally, as a composite factor. (10) 

Visualization. Thurstone rotated Axis XII to positive manifold but 
did not interpret it. "There were forty loadings in the +.20 range with a 
minimum value of —.096, and five loadings 2 40. The new set of rotations 
retained thirty-nine entries in the diminishing range, with a minimum value 
of —.129, and produced four 2.40. 


VISUALIZATION 


Thurstone’s Revised 


Name of Test Solution Solution 
(21) Form Board .397 .617 
(24) Punched Holes „527 .617 
(19) Lozenges А .530 .587 
(25) Mechanical Movements .142 .396 
(58) Vocabulary (Chicago) .545 .308 
( 5) Reading IT .502 .162 


The reason that Thurstone did not assign a unitary label to this factor 
may have been that the two verbal tests, Chicago Vocabulary, and Reading 
II, were loaded so heavily along with the three non-verbal tests, Lozenges A, 
Punched Holes, and Form Board. 

The additional rotations diverted a large proportion of the variance 
of the verbal tests to the verbal factor. Meanwhile, Form Board and Punched 
Holes increased their loadings strikingly. Mechanical Movements also 
gained substantially. Most of these increases in variance were due to rota- 
tions with Residual Axis 10 and the Restrictive Reasoning factor, 

In the Form Board test, the examinee is required to visualize how the 
separated parts of a geometric figure can be rearranged to fit the figure. 
In Punched Holes he must visualize the folding of a sheet of paper and the 
cutting of holes in the folded piece, and then recognize how the paper would 
look unfolded. For Lozenges A, the examinee must visualize a diamond- 
shaped card with a hole in one corner being picked up and turned over 
then placed with the prescribed edge downward. In Mechanical Move- 
ments visualization is probably an aid to the individual in setting the ma- 
chinery in motion mentally. 

| Thurstone's description of spatial factor S, "facility in spatial and visual 
imagery" (1, 80) seems to describe this new factor even better than it does 


factor S. Thus, if the visualization label j 
| Л 2 is adopt у ; factor 
re-evaluation of the spatial factor is indicated Uno Ge Tbe о 


| 
| 


~ 


WAYNE S. ZIMMERMAN 91 


The following are offered as hypotheses to distinguish between the 
spatial and visualizations factors.* 

The first is that some problems can be solved either spatially or by 
visualizing. When only a slight degree of turning or rotating is required 
for an individual to orient, himself with an external object, he is more likely 
either to move himself or to feel himself adjust empathically, perhaps kines- 
thetically, to the stimulus situation. If a greater degree of adjustment is 
required, however, it might be beyond the individual’s power to emphasize. 
In order to bring himself and the object into alignment, he would have to 
form a mental image of the object and then manipulate it into position. 
For example, consider an individual looking at a house whose foundation is 
If, in orienting himself, he tends to line himself up with the house, 


askew. 
If, on the other hand, he imagines the house rotated 


he is “spatializing.” 
back into an upright position, he is “visualizing.” 

The second hypothesis, following from the first, is that space and visuali- 
zation lie on a difficulty continuum. И visual stimuli must be rotated into 
new positions, the simpler problems can be solved by spatial empathy, 
is more difficult items involving several turns or rotations would evoke 
At the extremes of the continuum the easiest items would 
emphasize Perceptual Speed while the most difficult items would demand 
Reasoning. ‘Thus, by merely varying difficulty or complexity, the same 
type of item could be used to measure four (or more) factors. This hypo- 
thesis was put to test and the results are described in another paper (7). 

The third hypothesis is that the spatial aspect of the spatial-visualiza- 
esses is the mere determination of the direction of action, whether it 
forward or back. In other words, it involves a 
e reaction. 

Axis XIII apparently has not 
Fifty-one tests had loadings 


wherea 
visualization. 


tion proe 
is left or right, up or down, 
directional discrimination and choice 

Memory for Observed Relationships. 
entered into any” of Thurstone’s rotations. 


— ther: 274 
between 45.20, and none was greater than .274. 
5 shani i rstone has recently made compar- 
*In a series of analyses of mechanical aptitudes, Thu у рат 


ЗУ я ч з the well-known and better established Space factor, 
able dune EE ee id Visualization and the AAT Mechanical Experience 
А MES БЕ visualization of rigid configuration moved into different. 
маја Нол. о ts within the configuration, and the 
2 25 vithe first and second hypo 


ed above, would seem to represent 
ena. It is here argued that the 
functions of, and are merely incidental 


d istinctions are А ’ ^ ; 
-flexible distin, (See also final paragraph under Eduction of 


“eybernetics” as “the facility of discriminatory 
tial direction of movement” would seem 


r's term 
a similar hypothesis in 


ision relates to the p esigned with 
ирер Ше id. De of paper pene шу factor m Rofi’s analysis of the Keesler 
mind did not appear on the брана + actors (Complex Perce ti d 
Field Battery (9). Instead they helpe C which headed Dean's 
Complex Reaction) and the variance on Co Iv on AAF Space I, was confined to the 
Cybernetics factor and which appeared сова. ал 1l the factors in the Spatial- 
Psychomotor factor. Obviously clear-cut вису 2 
Visualization domain must await still further СУ. 


92 PSYCHOMETRIKA 


'The attempt was made in the new solution to achieve a psychologically 
meaningful position for this axis, and in doing so the requirements of simple 
structure and positive manifold were met fairly well Forty-two entries 
remained in the diminishing range of ==.20, while three values were raised 
to above .40, two of them approaching .60. 


MEMORY FOR OBSERVED RELATIONSHIPS . 


Thurstone’s Revised 


Name of Test Solution Solution 
(51) Picture Recall .176 .591 
(14) Disarranged Sentences .204 .570 
(52) Theme .172 .424 


The test now leading the list picked up variance from a number of 
sources, but primarily from Perceptual Speed, and secondarily from Number 
and Memory. Disarranged Sentences also increased its variance considerably, 
borrowing chiefly from Verbal Relations, Numerical Operations, Memory, 
and Word Fluency. Theme gained its weighting primarily from residual 
Axes 11 and 12 and secondarily from Restrictive Reasoning, Word Fluency, 
and Memory. 

Interpreting this new factor is diffieult, and only a tentative name 
has been adopted for it. In Picture Recall a picture is studied for a limited 
period, after which questions must be answered regarding various details. 
On the basis of this test alone there is a strong suggestion of visual-memory— 
a welcome conclusion since a visual-memory factor has been identified in a 
number of previous studies, especially those performed by aviation psychol- 
ogists. Accounting for Disarranged Sentences, however, requires some 
explanation. 'The examinee has to rearrange words into their correct order 
before he can answer the questions asked. Visual memory might be an aid, 
since the examinee with a strong ability to recall visually the construction of 
a sentence might be better equipped to set the words into their proper places. 
It is also possible to rationalize the presence of Theme, which has the third 
highest loading on the factor. The subject is asked to write about an ac- 
quaintance just as he might discuss him with a friend. Often enough this 
task involves description of physical characteristics as well as physical activi- 
ties where the ability to recall a visual picture should aid the writer, 

However reasonable the foregoing explanations of the vi 
element in Disarranged Sentences and Theme may seem, the 
weakened by the absence of other tests which should a 
token.  Disarranged Words, for exampl i и 

а : SS ple, seemingly should 
same processes, but its loading is only .09. If pure visual mem 
key the ability should be equally useful in reconstructing either 
words. The disparate factor pattern suggests that some element 


sual-memory 
argument is 
by the same 
involve the 
ory were the 
sentences or 
of meaning- 


WAYNE S. ZIMMERMAN 93 


fulness is probably important in the former task that does not enter into the 
latter. | 

In speculating regarding the possible breakdown that might be achieved 
in the area of memory through the application of factor methods, it may 
be recalled that Thurstone raised the question of whether there might be 
a factorial distinction between rote memory and memory for ideas (1, 86) 
Possibly this new factor might stress the memory for ideas as opposed lo 
the rote memory measured by factor M. 

It is entirely possible that those who made good scores on the Picture 
Recall test did so by applying what might be termed “logical memory” (1, 86); 
that is, they observed the various relationships in the picture рев 
verbalizing them. Аз observing individuals, they would Нава look 
for related elements, tying them together in a logical pattern as an aid to 
memory. Their ability to recall the proper order of words in sentences would 
be due to having concentrated upon word relationships. This ability would 
show up in Theme since observant people would be better able to recall 
accurately various characteristies of friends and acquaintances. | 

This latter ability seems to be а more nearly unique feature of the three 
tests than the former, although it is always possible, of course, that other 
variance may be represented also. The tentative term “Memory for Observed 
Relationships" has been adopted awaiting clarification from other studies. 


REFERENCES 
1. Thurstone, L. L. Primary mental abilities. Psychometric Monogr., No. 1. Chicago: 


Univ. Chicago Press, 1938. 
2. Zimmerman, W. S. Visualization tests. In J. P. Guilford (Ed.), Printed classification 


tests. Army Air Forces, Aviation Psychology Program Research Reports, Report No. 5. 
Washington: U. S. Govt. Printing Office, 1947. 

3. Guilford, J. P. Factorial picture of tests and criteria. In J. P. Guilford (Ed.), Printed 
Classification Tests. Army Air Forces, Aviation Psychology Program Research Reports, 


Report No. 5, Washington: U. S. Govt. Printing Office, 1947. 
4. Zimmerman, W. 5. A simple graphical method for orthogonal rotation of axes. Psycho- 


metrika, 1946, 11, 51-56. 
5. Holzinger, K. J., and Harman, H. H. Comparison of two factor analyses. Psycho- 
metrika, 1938, 3, 45-60. 
6. Thurstone, L. L. An analysis of mechanical aptitudes. Psychometric Laboratory 
Report, No. 62, University of Chicago, 1951. 
f changes in item complexity upon the factor com- 


7. Zimmerman, W. S. The influence o 
position of a spatial-visualization test. Educ. psychol. Measmt., in press. 
8. Degan, J. W. A reanalysis of the Army Air Force battery of mechanical tests. Psychol- 


metric Laboratory Report, No. 58. University of Chicago: 1950. 
9. Вой, М. Е. Personnel selection and classification procedures: Perceptual tests, a factor 


analysis. School of Aviation Medicine, Project Report, April, 1950. 
10. Zimmerman, W. S. A note on the recognition and interpretation of composite factors. 


Psychol. Bull., in press. 


Manuscript received 6/26/52 
Revised manuscript received 8/23/52 


<» 


BOOK REVIEW 


Geruarp TrxvxER. Econometrics. John Wiley and Sons, N.Y., 1952, pp xiii + 370, $5.75. 


fellow of the Econometric Society, the Institute of Mathematical 
has completed the useful task of 
nometrics. Other volumes in the 
at they do not cover the 


Gerhard Tintner, 
Statistics, and the American Statistical Association, 
providing the first textbook on modern methods of econ 
field are not competitive with Professor Tintner's book, in th 
methodological contributions of the past decade. 

Econometrics deals with a general discussion and illustration of the subject, the 
tion of multivariate statistical methods to economie data, econometrie model 
construction, and a study of time series analysis. In an appendix the author gives а brief 
discussion of matrices, determinants, and computational methods. Numerous examples 
are given throughout to illustrate the techniques developed. The subject matter is 
mathematical in nature and kept on a high plane without being made oppressively rigorous. 

Two outstanding contributions of the book are the discussions of multivariate 
statistical methods and certain aspects of time series analysis. In the section on multi- 
variate methods, Tintner introduces the reader to such topics as multiple regression, 
discriminant analysis, principal components, canonical correlation, and weighted re- 
Economies students are not especially familiar with these methods and will 
Unfortunately, he did not choose a set of illustrative 
be instructive to other social scientists. There is great 


need for the application of multivariate methods to the analysis of survey data involving 
large samples of individual respondents and numerous personal variables. Had he chosen 
examples from economic data collected in surveys, he would simultaneously have struck 
a note appealing to psychologists and sociologists. 
A substantial portion of the section on time series analysis covers quite conventional 
material on the measurement of trend and of seasonal and cyclical variation in economic 
activity. These matters are diseussed in most elementary texts on economie and business 
statistics, although in a less mathematical form. Well-known mathematical techniques 
of fitting orthogonal polynomials, logistic curves, and Fourier series; smoothing by moving 
averages; and perimdogram analysis are included. The treatment of serial correlation, 
stochastic difference equations, autoregressive schemes, and correlogram analysis is more 
interesting and less familiar. | ; ү ПЕТ 
Cleavages exist among econometricians, and Tintners approach to the = ject 18 
one that fails to capture what the reviewer regards as the singular contribution о Soona 
metrics to methods of social science research. A feature of econometric methods not found 
in psychometrics or empirical studies in other social sciences is the systematic blending 
of a priori information and empirical observation. Social science investigations often 
proceed by purely empirical methods of reasoning. Data are searched for regularities 
and high correlations. The alternative approach of some econometricians is to use а prior? 
information such as economie theory, institutional practices, legal restrictions, and tech- 


nological information to fashion a mathematical model of the economy. It is in the use 
of the economic theory of behavior to formulate testable hypotheses that other social 
scientists could possibly derive some benefit from a study of econometric methods. The 
a priori information of all types serves to define the class of variables being considered 
and many specifications about the mathematical form of the relationships used. The 
latter specifications are, however, seldom complete; hence simple functional forms a 
widely used to expedite computational and other analytical efforts. Econometric mode's 
95 


applica 


gression. 
find Tintner’s exposition useful. 
examples that will simultaneously 


fom 
96 » BOOK REVIEW? 
LI 
constructed on the basis of a priori reasoning are then confronted with statistical observa- 
tions. The structural characteristics, the parameters, of the model are estimated from 
' the data and identified with basic economie concepts. 

Another outstanding characteristic of modern econometrics is that the stochastic 
properties of models are explicitly developed at the outset of analysis. Tintner fully 
presents this aspect but gives inadequate attention to the choice between two main alter- 
native stochastic models. One model assumes that individual variables are subject to 
error, say measurement error, while another assumes that behavior is subject to error, 
say through the neglect of explicit treatment of minutiae, rare events, and nonmeasurable 
quantities. Tintner implicitly tells the reader that both errors in variables and errors in 
equations are present, that there are inherent statistical difficulties in using a stochastic 
model based on both types of error, and that therefore we must arbitrarily assume one 

model or the other. His preference visibly is for the error-in-variable model. He fails 
to emphasize for the reader that it is, in principle, possible to obtain accurate measurements, 
that we are moving in the direction of better and better statistical measurement of economic 
data; and that in systems involving large numbers of individuals making free choices, 
behavioral disturbances are inevitable. It is virtually inconceivable to imagine social 
behavior of individuals that could be described completely by a set of measurable variables 
that the human mind of an investigator can simultaneously manipulate. The reviewer 
has a distinct preference for models whose stochastic structure permits explicit disturbance 
of behavior (errors in equations) and feels that other social science studies should use a 
similar probability scheme. Tintner devotes a chapter to rather formal calculations 
with errors-in-equation models. In this respect his book is inadequate. 

Tintner makes a happy use of examples to illustrate his methods, and this, in itself, 
adds greatly to the pedagogical contribution. The examples are not, however, well chosen 
to bring out the best of econometrics. The reader may get the impression that the subject 
is not to be taken too seriously, because Tintner frequently summarizes the results of 
an example by warning the reader to accept the findings only with the greatest of caution 
due to the fact that a number of assumptions are probably not fulfilled. - There are actual 
empirical studies which are to be taken seriously and in which careful programming attempts 
to fulfill the underlying assumptions. Tintner’s attitude is overly negativistic, but ће 
could have made а better choice of examples by selecting those yielding results in which 
he could have some faith and about which he would not have to be apologetic. A smaller 
number of examples, elaborate enough and penetrating enough to show what econometrics 
can truly accomplish, would have been preferable. Students may wonder, after having 
worked through Tintner's text, what would be an acceptable econometric investigation. 
Is а subject mature enough to warrant a textbook if the accomplishments are no less 
subject to criticism than Tintner's examples? Surely Tintner cannot feel that empirical 


econometric studies are as weak as he leads one to believe his examples are; otherwise 
he would be in another profession. 


1 perficial analysis, 
gives off-hand explanations; 
This suggests that these 


m in the book, 
Tintner’s style is not pleasing, in that his pages are cluttered with 


erfluous or irrelevant. 


University of Michigan L. R. Klein 
‚<< ve” лао 
TM ЈА [n 


— 


PSYCHOMETRIKA—VOL. 18, No. 2 М 
JUNE, 1953 » 


ESTIMATION OF THE CORRELATION COEFFICIENT IN THE 
CASE OF A BIVARIATE NORMAL POPULATION WHEN ONE OF 
THE VARIABLES IS DICHOTOMIZED* 


J. S. MARITZ 


SOUTH AFRICAN COUNCIL FOR SCIENTIFIC AND INDUSTRIAL RESEARCH 


It is shown that the problem of estimation of the correlation coefficient 
of a bivariate normal population when one of the variables is dichotomized 
may be attacked with “probit analysis" methods. This represents an 
extension of the work of Gillman and Goode (3), as it was possible to find by 
this approach an approximation to the large-sample variance of the resulting 
estimate G of p. Ап empirical investigation was undertaken with the object 
of obtaining some information about the distribution of G for large sample 
Methods for determining the *'pass-fail'" cut-off are considered. 


size. 

1. Introduction. lt is often necessary to estimate the correlation 
coefficient, between two variates when one of them is dichotomized. If it is 
reasonable to assume that the joint distribution of these variates is normal, 
the statistic ты», (biserial т) is commonly used as an estimator of the parameter 
p. It is well-known that when the variate which is not truncated (which we 
we will call the “continuous” variate) is restricted in some way, then ты, is 
no longer a consistent estimate of p (4). Gillman and Goode (3) have suggested 
an alternative procedure resulting in the estimator G or p which appears 
to have none of the disadvantages of r,;,. It has been pointed out by Sichel (4) 
that the weights used by these authors for fitting their regression line are 
not strictly the best and that it would add considerably to the usefulness of 
this method if something were known about the sampling distribution of G. 
It is our object to state some results which were the outcome of an investi- 


gation of these points. 


2. Case of No Restriction 
2.1. We will denote the “continuous” variate by = and the dichotomized 


variate by у. It will be assumed that there is no restriction on x and that 
the x distribution is normal with mean 0 and variance 1; also that 


ply | а) dy = ag exp 1—00 — w] dy. (1) 


ж "wi с the South African Council for Scientific and Industrial 
Шо: mem Кен this paper. The invaluable assistance of Mr. H. 5. 
is also gratefully acknowledged. 


Research for permissi D : : 
Sichel in the preparation of this manuscript 15 а 
— —MÀM M ————€ 


97 
Bureau Edn], "ду. Resear. 
| OVID qx, “ING COLLEGE 
| Dated . , 


ia "ВЕ. 


98 * PSYCHOMETRIKA 


If the coefficient of correlation between х and у is p it follows immediately 
that the marginal y distribution is normal with mean 0 and variance 1/(1 — p^), 
and 


b 

bz р © or DE E 2 
IV rep Ми (2) 
this being the result which Gillman and Goode have used. If an estimate б of 

b сам be found it seems reasonable to take as estimate of p the quantity: 

b 
ses Rn (3) 
МЕ 


We want to suggest a procedure for estimating b which differs slightly 
from that of Gillman and Goode and which is simply an application of the 
probit analysis technique (1) to this problem. 

2.2. Observations of which the у values are greater than or equal to c, 
say, will for convenience be called "successes." It follows from (1) that the 
proportion of successes at a particular x value will be 


T, = [ = exp [—3(y — bx)*] dy, (4) 


which we may rewrite as 


T, = || erf (и) ди, (5) 
v e-bz 


where 
erf (и) = 7z exp [— пе]. 
И we now define a “probit,” Y, by the relation 


т, = " erf (и) du, (0) 
-y+ 
we have 
Y = 5 — c + bzr. (7) 
2.3. In practice we do not usually work with the proportion of successes 


at an exact z-level. The observations in a certain z-interval are grouped 
together, and we write 


= | е) de, 
J e-bz 
where т; is the proportion of successes in the ith a-interval and ал is the 
mean of that interval. Evidently a grouping error is introduced, and this 
problem has been fully discussed by К. D. Tocher (5). However, the grouping 


J. S. MARITZ 99 
error is not very large and may be reduced by choosin, i 
; may g a finer grouping s 
that in what follows we will disregard the grouping error, thus Aerea 


the discussion somewhat. 
2.4. If our z-scale is divided into Æ intervals, the proportion of the 


total population in the zth interval being P; , then the probability that in а 
sample of size М, n, observations occur in the first interval, n; in the second 
and so on, is given by the multinomial distribution. If the number of ибо" 
cesses" in the 7th interval is m; (out of a possible n;), then we have 


pm; |n) = eC ) 77 (1—m7)"^" (8) 


Making the reasonable assumption that 
„ти 0m...) = pm | m) 


ПОТ Mahre s 
X plm: | ns) X +++ Х рт, |n), (9 


we have 
‚ть | N) = p(m |n) X +++ X р(ть |n) 


pmi. --. 
X pn; +++ пк | №) = е", (10) 


say, from which follows 
• NI n; 
L^ duy. yat a (,.) 
+ У т, loge: + У) (n: — mjlog(1— т). (11) 


If 0 is a parameter not contained in P; then 


ƏL _ snp; — по дт: : 
др“ = — (12) 

where p; = т,/т . 
t to estimate are [see equation (7)] 


The parameters which we wan 
= 5 — cand b, so that, since 


дт; One As, = Pi 
81: — e(-Y 5, pcm СУ +5, 


we have the equations 
ðL (р; — 7) pi т) —У. +5) = 0, 
У; У erf ( ) = 


да “т — т: 
(13) 
т\р. = Tu Y; == 
9L. уут аай (- + 5) ='0, 


which may be used to find the estimates à and б of a and b. 


100 PSYCHOMETRIKA 


These equations are identical in form with the usual “probit” equations 
and may consequently be solved in exactly the same way. The mathematical 
details of the method of solution are given in Finney's book (1) and will 
not be repeated here, but an example will be given in 2.8 to illustrate how 


the solution is carried out in practice. 


2.5. We may now obtain the large-sample variances and the covariance 


of the estimates а and 6 of a and b. 


[vax (8) cov GD] 7 x(z 2) 


It follows from (12) that 


P == f T nipi =r) _ (=) т; 


= т;(1 — т) да / [т;(1 — т;) 


aL ) = 
- of 2% ab 


(14) 


aL 
z (н) 


пир; — тоа — 2т;) 


* Tl — по }, 
giving 
oL Ё 
x (£L) = —N >; Рао, D (15) 
where 
_ laf (=Y; + 9f. 
"a т:(1 =" т;) 
Similarly, 
2 k 
x (2L) = -N X Рам, 
(16) 
aL k 
8.2) = —N >; Раа; * 
We may derive, using (14), (15), and (16), the results 
2 1 
МЕЛО TS wands Е 
^ Y Pwa 
var (a) E & 
NIX Poo] > Рам, — 2 (17) 


zip 


Ue? е раў" 


„ Coeli 


NS 


| calculation of this quantity will be il 


J. S. MARITZ 101 


where 


= У) Passi 
>» Раш 


2.6. From equation (3) it follows that, approximately for large N, 


var (б) = Des з (18) 


where var(b) is given by (17). 
ТЕ we consider the grouping of the z-variate to become finer we see that 


k 
> Раоа — 2)? 


i=l 


tends to 
= Е [erf (c — bx) mee 
[st ер ну eie 9 ® 
where 
glc — bx) = | j erf (u) du, 
and 
* 5 [erf (c — а) 
:- [22 ваја = ee =) 
С Р [erf (c — 529] 
[91 Gem ыл = oe =] ™ 


We may therefore say that, asymptotically, 


z eT [erf (c — b) mac зу 
var(b) = N Г. erf (2) БЕШИ = oe — 89] (x — 2) а) , (19) 


while similar expressions hold for var(a) and cov (a,b). Expressions (17) 
therefore give approximations to what may be called approximations to 
the true asymptotic variances and covariance of a and b. 

In practice we must always use (17), and since we do not know any of 
the parameters, we must substitute in the expressions their estimates ob- 
tained from the sample. Thus we use, for example, 

5 1 
E а (20) 
aic dic E nó: — 2) ' 
ves а "grouped" estimate of уаг(б). The 


remembering that this expression gi \ 
lustrated in our example of 2.8. 


102 PSYCHOMETRIKA 


2.7. An interesting comparison may be made between the sampling 
variances of the estimators G and 7,;, of р for large samples. If we denote by 
В the cut-off between "successes" and "failures" on the y-variate in "stan- 
dard measures," i.e., 


c 5a 


= = = ——— 9 
Р МЕ МВ“ (21) 


*heh № var(b) [c.f. equation (19)] and N var(r,;,) may be tabulated for 
various values of p and 8. We have calculated expression (19) approximately 
by grouping into very fine intervals and summing, and Table 1 gives N var(G) 
and N уат(гы,). 


TABLE 1 


N var (С) and N var (ry;,) for Various Values of p and В 


Values of p 


B 4 6 8 


N var(@) N var(r,) N var(G) N мањи) N var(G) N var(rpis) 


0 1.135 1.195 ‚704 .798 .271 .376 
5 1.256 1.327 .785 .906 .311 .452 
1.0 1.705 1.822 1.095 1.322 .438 ‚757 
1.5 2.881 3.125 1.910 2.458 .795 1.658 


Evidently б is a “more efficient” estimate of p than тъ, when the popula- 
tion from which we sample is not restricted. There are, however, other 
advantages attached to the use of this analysis and these will be mentioned 
in some of the following paragraphs. 

2.8. For our example we have a sample of 200 individuals drawn from 
a bivariate normal distribution, given that the a-variate has expected value 
zero. This example is typical of cases when some standarized test is compared 
with a “pass-fail” criterion. The data are given in Table 2. 

The procedure is to calculate, first, the proportion of "successes" in 
each "cell" and then to transform these proportions (or percentages) to 
empirical probits with the help of tables (1). These probits are then plotted 
against the z-values and a first approximation to the regression line is drawn 
“by eye." From the provisional line the Y-values are-read off; they are the 
ordinates on the line corresponding to the X-values. These Y-values and the 
percentages of “successes” are used to find the weighting coefficients (15,) апа 


the "working" probits. Tables are available from which these quantities 
may be read off quite easily (1, 2). 


J. S. MARITZ 103 


TABLE 2 


Illustrative Example 


H Class Prop. 
2 Class Fre- No. of of Em- Provi- Work- 
Class Mean quen- Suc- Suc- pir- sional nů ing ДРАЧ 
Interval X су cesses cesses ical —Probit Probit 
n n m Probit Y 5 y 
2308 = 275 2446 3 2 «67 : 6.16 1.15 5.14 2.81290 
1.75 — ‚2125 1,901 7 5 Eril 5.79 3.54 5.53 6.94194 
1.25 - 1.75 1.468 15 11 -73 5.61 5.40 9.01 5.60 13.22668 
75 - 1.25 .979 24 15 .62 5.31 5.04 15.27 5.30  14.94933 
.25 – ‚75 .490 35 11 .31 4.50 4.67 21.41 4.51 10.49000 
= Fe Ti .000 43 1 .33 4.56 4.25 22.25 4.56 
= 975 ==. 5 — 400» 27 2 .07 3.52 3.90 10.93 3.60 —5.35570 
—1.95 - — .75 — :979 28 2 .07 3.52 3.49 7.44 3.53 —7.28376 
Li 75-105 —1460 9 T Ш 8.77 3.18 91.46 4.28 —2.14828 
IRAM МИК 0 =o 2.78 .82 2.85 — .62752 
0 0 = 2.85 .18 2.01 — .44098 


—92.75 – —2.25 —2.446 5 


Totals 200 63 


e is found by fitting a “least-squares” straight 


The new regression lin { 
For this purpose we require 


| line to the “working” probits with weights аф. 
7 Ent = 92.96, 

E nûX = 32.5712, 

DS my = 428.7571, 


\ 4 ' 
1% У; паху = 196.9369, 
| У, noX? = 74.8951. 
We then have 
К (У) љ0Х)( У тфу) 
э, > mbXy — (> nid) 
yt: ; Œ mX 
N nox — 23 Ш 
_ 46.7007 
— 63.4828 
= .7858, 


so that G = .5927. 


104 PSYCHOMETRIKA 
We also find 


2 niy _ 5 У nox ПОХ азыр 
тай > no 


and we can plot the regression line 
Y = 4.3545 + .7358X, 


this line being a second approximation to the true estimate of the population 
regression line which this sample provides. The empirical probits and the 
two lines are plotted in Figure 1. The lines are quite close together and it 
appeared that a further “cycle” (starting with the line calculated from the 
"eye" line) was not necessary. As a rough guide it may be assumed that no 
further cycle is required if the difference in slope between the two lines is 
less than five degrees, provided X is in standard scores (ie. S.D. = 1) and 
the Y's are "probits" and when plotting a unit is represented by the same 
length on the X and Y scales. 


© — Eug 9 


X —— 


FIGURE 1 
"Eye Fit" and Caleulated Approximation to Regression Line 


We may now find estimates of уаг(ђ) and var(G). Using equation (20) 
we have 


est. var (6) = {> nox? — Qe 


mb | = 01575, 


J. 5. MARITZ 105 


and from (18) 
1 


a+b lest. var (b)] = .00430. 


est. var (G) — 


Thus 
est. S.E. (G) = .066, 
во that finally our estimate of p is 
.593 = .066. 
At this point there arises the question of the shape of the distribution 
of G for a sample size of two hundred. A theoretical investigation has not been 


carried out, but certain practical work (of which more in 2.10) has shown 
that the distribution of G may, quite reasonably, be assumed normal for № 


аз large as 200. 
2.9. After caleulating the regression line we may carry the analysis а 


little further. 

(a) If equation (1) holds true it may be shown that the quantity (where 
т; is found from the regression line), 
х (m; — пату)? 
пл — т) 
is distributed approximately as x with k — 2 degrees of freedom if each 
т; is not small and № is arge. This then provides a test of the assumption (1). 


TABLE 3 
Chi-Square Test of Assumption in Equation (1) 
1 (m — nz)? 
Y = 4.3545 + .7358 X ne m m — nt = ——— 
п#(1 — #) 
6.15 .8749 2.62 2 
5.80 .7881  5.5218.14 5/18 = ла .004 
5.43 .6664 — 10.00 11 
5.07 .5279 12.68 15 +2.32 .899 
4.72 .8897 13.64 п —2.64 .837 
4.35 .2578 11.09 14 +2.91 1.029 
3.99 .1562 4.22 2 —2.22 1.984 
3.68 .0853 2.39 2 — .39 1070 
3.27 .0418 .38 1 
2.91 .0183 .07; .49 0,1 bl 546 
2.55 .001 .04 0 
4.769 


Тош 


106 PSYCHOMETRIKA 


For our example the calculation of x^ is shown in Table 3. It will be 
noticed that the ‘‘tails” are grouped" to form cells with = 18 апал; = 25. 
It is suggested that the smallest n; should be close to twenty. Since for this 
example we find х = 4.769 for 5 d.f., we may accept (1). 


(b) (i) We may obtain an estimate of the point of dichtomy (8) by 
referring to equation (21) and using the estimate 


We may also find an estimate of 8 by the relation 
р = [ erf (u) du, 
Ja 


where p = (Total number of successes in sample) / N. 
(ii) It is of some interest to examine the large-sample variances of these 
two estimates of 8. We have, for large N, 


var (8) — ban [var (a) + 8°? var (b) + Bp cov (a,0)], 


var (В) = [erf (8)]* var (p). 
'The expressions for var(a), var(b), and соу(а,ђ) are given in equation (17), 
while var(p) is given by the well-known result, 


va G= + p. 


Some values of № var(8) and N var(B) are given in Table 4. The values 
of N var(@) were calculated in the same way as were those of N var(G). 


TABLE 4 
М var(8) and М уат(В) for Various Values of p and В 


Values of р 


4 6 8 
8 P 
N var(B) N var(B) М var(8) N var(B) N var(B) N var() 
0 1.351 1.571 1.146 1.571 .925 1.571 
5 1.612 1.721 1.480 1.721 1.396 1.721 
1.0 2.417 2.280 2.668 2.280 3.375 2.980 
№6 = 4.745 3.716 6.466 3.716 10.650 3.716 


Ay 
v 


J. S. MARITZ 107 


This table may be contrasted with Table 1. Whereas G is a “more 
efficient" estimate of p than ты, for all values of p and В under consideration, 
we find that 8 is a “more efficient” estimate of 8 than 8 only for certain values 


of p and 8. 
2.10. An experiment. was carried out in which 100 samples of 100 indi- 


viduals each were drawn from a normal bivariate distribution with p = .6. 
One variate was dichotomized with 8 = .5 and 100 G's were calculated. 
35 


чу o ч 
P] wo, о bu БУ ~ a R © o 
N ~ N "m [^] = ~ са «9 с 
5 о R © bU Ж i © ~ ~ 
D FIGURE 2 


100 Cases in Each Sample) from Normal Population 


istributi Samples of С ( 
Distribution of 100 Samy in Which p = 6 


Figure 2 shows a histogram of the observed distribution of 100 G's. We also 


have the results shown in Table 5. 
In view of the good agreement between observed and expected figures 


in this example and the fact that the distribution of G (see Fig. 2) is not 
very skew, it seems reasonable to assume that for М = 200 and more the 
, 


sampling distribution of G will be very nearly normal. 


108 PSYCHOMETRIKA 


TABLE 5 
Observed and Theoretical Values in 100 Samples from Normal Bivariate Population 
with р = .6 (100 Cases in Each Sample) 


Observed "Theoretical 


Mean G .610 -600 
Mean ry, .616 .600 
Mean 8 .505 .500 
Mean B .506 .500 
var(G) .0084 .0079 
var(ry;,) .0094 .0091 
var(8) .0168 .0134 
var(B) -0201 .0172 


3. The restricled normal surface. 


3.1. A problem which arises quite often in practice is that of estimating 
the correlation between two variates when we have only a "restricted? 
sample available. This happens, for example, when candidates are selected 
for a certain course of training by some “screening” device so that final 
criterion follow-up data are not available for a random sample from the total 
population. 

If the restriction occurs in such a way that equation (1) still holds 
although the marginal z-distribution may not be normal, we may still find 
an estimate of p in the total population by the procedure outlined in para- 
graphs 2.4 through 2.9. An approximation to the standard error of G may, 
as before, be found, using equations (20) and (18). 

3.2. We now give an example in which the z-distribution is truncated, 
that is, corresponding to a practical situation in which only those applicants 
are selected who score above a certain level in a test. А sample of 1000 


individuals was drawn from a normal bivariate distribution with p — 6, 
TABLE 6 

Sample from Normal Bivariate Population with р = .6, Truncated atz = — .25 S.D. 
x n m p Ргођњ Y p y mibX 

3.424 1 1 100.0 © 6.92 .15 7.35 .51360 

2.935 2 2 100.0 © 6.56 -50 7.06 1.46750 

2.446 9 7 77.8 5.76 6.19 3.30 5.65 8.21856 

1.961 30 24 80.0 584 584 14.71 5.84 28.84631 

1.468 57 87 6490 5388 5.46 33.50 5.38 49 31012 
.979 136 76 — B5.9 515 5.00 86.33 5.15 8451707 
.490 184 63 34.2 4.59 4.72 113.84 4,60 55.78160 
.000 189 51 — 27.0 — 4.89 4.35 103.02 439 .00000 


Totals 608 261 


} 


J. S. MARITZ 109 


the y-variate being dichotomized at 8 = + .5. The a-distribution was then 
truncated at —.25 (in standard measures) resulting in the data given in 


Table 6. 
The “provisional” regression line and the calculated line are shown in 


— Eye ‘tit’ 


, г 3 4 
x = 

FIGURE 3 
“Eye Fit” and Calculated Regression Line for Example Where z is Truncated 


Figure 3. Using the Y-values from the provisional line and the corresponding 
ing sums were calculated: 


working probits, the followi 
Уф = 355.5000, 


Yi nsX = 228.6548, 
} шу = 1710.7584, 
У пёХу = 1186.1795, 
У núX’ = 265.1983. 


Using these sums, we find 


| 


85.8334 » 7266, 


bcm 


118.1294 
so that С = .5878. 


110 ] PSYCHOMETRIKA 


The calculated regression line, shown in Figure 3, is 


У = 4.3450 + .7266Х. 


We also have, approximately, 
S.E. (6) = .0487. 


We may test our assumptions in paragraph 3.1 by calculating x^ in the 
same way as was done in Table 3. For this example xë = 2.238 for 3 d.f. 
The estimate of 8 using equation (21) is B = .5299. 

4. Conclusion. In conclusion we want to emphasize the fact that G is 
a consistent estimate of p even when the sample is “restricted” as indicated 
in paragraph 3.1. It is of course well-known that in these circumstances р 
cannot be estimated by ты,, so that it would always seem to be advisable 
to use G rather than ты, , because even when both ате consistent, estimates 
of p, G is the more efficient estimate. 


REFERENCES 


= 


Finney, D. J. Probit analysis. Cambridge: Jambridge Univ. Press, 1947, 

2. Finney, D. J., and Stevens, W. L. Table for the calculation of working probits and 
weights in probit analy: Biometrika, 1948, 35, 191-201. 

3. Gillman, L., and Goode, H. H. An estimate of the correlation coefficient, of а bi- 
variate normal population when X is truncated and Y is dichotomized. Harv. educ. 
Rev., 1946, 16, 52-55. 

4. Sichel, Н. 5. First peace-time validation of army selection tests with a discussion of 
some statistical problems encountered in this project. Bulletin of the National Institute 
for Personnel Research of the South African Council for Scientifie and Industrial 
Research, 1950, 2, 4-35. 

5. Tocher, К. D. А note on the analysis of grouped probit data. Biometrika, 1949, 36, 

9-17. 


Manuscript received 6/9/52 
Revised manuscript received 7/31/52 


PSYCHOMETRIKA—VOL. 18, No. 2 
JUNE, 1953 


A SIMPLE PROCEDURE FOR REARRANGING MATRICES* 


W. A. GIBSON 


UNIVERSITY OF NORTH CAROLINA 


Guttman’s scalogram board technique for reordering the columns and 
rows of a matrix is described and its disadvantages are pointed out. A simple 
out these disadvantages 


and inexpensive procedure for doing the same job with 
is outlined. 


There are a number of problems in psychometrics for whose solution it would 
he order 


be helpful to have an easy and inexpensive method for altering t 
of the rows and columns of a matrix. One example of this kind of problem 
is the attempt to find a Spearman hierachy in a correlation matrix. Another 
is cluster analysis or the selection of highly inter-related subgroups of tests 
for such factoring procedures as the grouping or multiple-group methods; 
Perhaps the most notable example is Guttman’s scalogram analysis.{ In fact, 
this paper might well have been entitled “А Cheap Scalogram Board." 

- In sealogram analysis the matrix to be rearranged is the score matrix, 
in which the rows represent the members of the experimental sample, while 
the column headings are the response categories for questionnaire items. 
Tn any row of this matrix, the only cells which are filled are those correspond- 
ing to the response categories which the individual involved has endorsed. 
These entries are 1’s or X’s or check marks, while all other cells are regarded 
as containing zeros. The task of scalogram analysis is first to reorder the 
rows of this matrix and then to shuffle the columns in such a way as to come 
out with the closest approximation to a parallelogram pattern of the non- 
zero entries. To aecomplish this, Guttman has invented the scalogram 
board, a device consisting of a rack which holds a hundred narrow strips 
of wood, each strip representing & Tow of the score matrix. There are a 
hundred recesses drilled into each strip to represent one hundred cells in 
that row of the matrix. The response pattern for any person is indicated 
by placing buckshot or small ball bearings in the recesses corresponding to 
the response categories he checks. When this is done for every person, 
the rows of the score matrix cà reordered to suit the investigator. When 
the time comes to interchange ¢ a second board, identical with the 
ssor Jozef Cohen of 


n be 
olumns, 


* the University of Illinois for a five-minute 
ашды Erte Ба the procedure described here. 
board technique for scale analysis. In 


ронуе шод which greatly fh logram 
Such V he sca dors 5 4 H 
ны Measurement and prediction. Princeton: Princeton Univ. Press 


Samuel A. Stouffer, et al., J 
1950. Ch. 4. 
111 


112 PSYCHOMETRIKA 


first, is placed upside down on top of the first board, with its wood strips 
at right angles to those of the first board. The two boards are then held 
together in that relationship and turned over, so that the balls in the first 
board fall into the corresponding recesses in the second board. The first 
board (which is now on top and completely empty of balls) is then removed, 
and the investigator can proceed to rearrange the columns of the score 
matrix, for the movable strips in the second board now represent those 
columns. 

Perhaps the main disadvantage of the scalogram board is its prohibitive 
cost, which is greatly increased by the precision of manufacture that is 
necessary in order that all balls from the first board will fall freely into the 
second when the two are turned over. Great uniformity is thus required 
in the spacing of the recesses and in the widths of the strips. Cost estimates 
run into the hundreds of dollars. А second disadvantage that might be 
mentioned is the time required to place the balls in their proper positions 
in the board. Even the dropper which has been designed for this work* 
will not be nearly so fast as the placing of check marks in the proper cells 
of a data sheet. A third drawback of this method for rearranging matrices 
is that it is applicable to tables of qualitative data only. That is, the score 
matrix may show only presence or absence of an attribute (indicated by 
presence or absence of a ball in a recess) and cannot reflect quantitative 
differences such as are usually present in mental test score matrices and in 
correlation tables. This drawback is of no consequence for scalogram 
analysis itself, which deals exclusively with qualitative data, but it prevents 
the use of scalogram boards in reordering many other types of matrices. 
One possible way to overcome this defect would be to employ beads of different 
colors in place of the metal balls, but there would still be a limit to the number 
of differentiable colors that could be used. A fourth restriction is that the 
capacity of the board, in terms of the maximum number of columns and 
rows it can represent, is fixed once the board has been constructed. 

Let us now take up the description of a simple procedure which will 
overcome the disadvantages of the scalogram board that have been mentioned, 
while at the same time introducing no serious new drawbacks of its own. 
The matrix to be rearranged is first recorded on an ordinary data sheet. 
То avoid certain difficulties later on, it may prove desirable to utilize only 
every other column and row in this recording process. The completed 
data sheet is taken to an ordinary paper cutter and is cut into strips in such 
а way that each strip is a row of the original matrix. If the narrow strips 
of the data paper tend to twist or curl either immediately or after considerable 
handling, the sheet can be fastened, before cutting, to a piece of stiff card- 
board by strips of masking tape applied to its right and left edges. Great 
uniformity in the cutting process is unnecessary. The resulting 


*Ihid., p. 96. 


strips сеп 


ve 


W. A. GIBSON 113 


be manipulated just as are the wood strips of the scalogram board. They 
can even be turned over or dropped on the floor without the numbers or 
check marks falling off. 

When the investigator is satisfied with his new ordering of the rows and 
is ready to shuffle the columns, he can align the row strips on a drawing board 
with any desired degree of care and fasten them all down, again using two 
strips of masking tape at the sides, or by some other means. A graduate 
assistant is then entrusted with the mission of carrying this material to the 
nearest photostating laboratory, where a photostatic negative is made. А 
ary. The negative is returned to the paper cutter and 
chopped into strips corresponding to the columns of the matrix. 'Тһезе 
strips can be manipulated freely to achieve a new ordering of the columns, 
and a second trip to the photostating office (or mere copying off) gives the 
reordered matrix in one piece. 'The second negative (a double negative 
yields a positive) could of course be cut into row strips if further reordering 
of the rows were indicated, and any number of additional cycles of this 
kind could easily be undertaken. 

The estimated total cost of a venture of this kind is in the neighborhood 
of a dollar or two, and if funds run out at any stage the photostating process 
can be replaced by manual transcription, which is especially simple for 
qualitative data. Very possibly some interesting variants of the procedure 


A б ы 
ovtlined here will occur to the reader. 

* Bles Narevsky, psychology department secretary at the University of 
mucus bolo (iia DAD was drafted, has suggested that the hectographing process might 
very well be adapted to this task, and one of the reviewers suggests the use of Eastman dry 
ees tissue and a sheet of cardboard to amalgamate the reordered row strips into a 
single coh which can then be cut into column strips, thus eliminating the need for photo- 


stating. 


positive is unnecess: 


Manuscript received 3/26/52 


Revised manuscript received 5/8/52 


— —— MÀ 


PSYCHOMETRIKA—VOL. 18, No. 2 
JUNE, 1953 


A TABLE FOR THE RAPID DETERMINATION OF THE 
TETRACHORIC CORRELATION COEFFICIENT* 


MzrviN D. Davivorr AND Howard W. GoHEEN 


U. 5. CIVIL SERVICE COMMISSION 


A table is developed and presented to facilitate the computation of the 
Pearson Q; (“cosine method") estimate of the tetrachoric correlation coeffi- 
cient. Data are presented concerning the accuracy of Qs as an estimate of the 
tetrachoric correlation coefficient, and it is compared with the results ob- 
tainable from the Chesire, Saffir, and Thurstone tables for the same four-fold 


frequency tables. 
Introduction 


Тһе tetrachorie correlation coefficient has been extensively employed 
since its introduction by Pearson (6) in 1901. Its adaptability to the easy 
handling of certain kinds of data has made it a popular technique. The 
tremendous amount of computational labor involved in handling the original 
formula has motivated many persons to attack the problem of simplifying 
the computational process. Pearson himself (6) derived several estimates 
of the tetrachoric, one of which, Qs , is the particular concern of this paper. 
This estimate was chosen because of its general adequacy and more partic- 
ularly for its adaptability to the mathematical manipulations involved. 
In slightly modified form, @ has been frequently employed as the “cosine 
method." Chesire, Saffir, and Thurstone (1) collaborated in the development 
the tetrachorie coefficient. Hamilton (3) presented 
а nomogram based on the cosine method. Hayes (4) worked out tables 
based on percentage differences, for which Goheen and Kavruck (2) offered 
simplified work sheets. Jenkins (5) offered graphical methods for rapid 


determination of the tetrachoric. é { 
The authors some years ago saw & copyrighted and unpublished graph 


P. Webster, a Navy biologist. This graph was 
entered with the parameter of the ratio of cross-products of the fourfold 
table diagonal cells. The one graph was used for all cuts. The authors have 
seen no derivation of this graph and had no idea of its theoretical origin. 
The usefulness of such a parameter and the idea of employing only one 
table instead of the large number involved in the Chesire, Saffir, and Thur- 
stone method interested us. We therefore set out to derive an estimate of 


x | tt, Chief of the Test D. š 
*Th debted to Mr. John Scott: C Development 5 

of the U. S autho are Comsnisioh, for his кр сы тү апа to Miss Elaine AEn 
Mrs. Elaine Nixon for the large amount of computational work involved in this paper. 


115 


of computing tables for 


for г... by Commander A. 


116 PSYCHOMETRIKA 


Та from the parameter mentioned above or any parameter not involving 
the position of the cuts on the variables. The га, values obtained are appar- 
ently identical to those obtainable from the Webster graph, and the inference 
we make is that Webster too worked from the Q; estimate of Pearson.* 
Pearson (6) made a brief assessment of Q, . He ran 15 trials of the tetra- 
chorie value with that given by his various estimates. Q, had an average 
absolute discrepancy of .021 from the actual tetrachoric value.j Further 
manipulation of the cosine variation of Pearson’s empirical formula has 


enabled the present authors to develop this table which yields the actual cosine 
method of Pearson Q; values. 


Derivation 
With the cells of the fourfold table labeled as follows, 


Pearson’s empirical formula for estimating the tetrachoric r was stated in 
the form _ 
ф-н (Е. УА a) 
у 2 Vad + Voce!" 
Since the sine of an angle is equal to the cosine of its complement, 


cos CM Vad — Vie x 
2 Vad + ус 


(1) 


Qs 


2 


E = 
ad + Ме 
EP 
2\ Vad + Ук 

rift _. 

Vad + Ук (2) 
This is the form which has been frequently employed under the n 

“cosine method.” 


The table presented in this article gives the actual Q, 
value for each т... from .00 to 1.00. As has been indicat 


сов 
cos 
= cos 


ame 


ог cosine method 


ed, one needs to 
"This inference has been confirmed in recent correspondence with Commander 


Webster. . З 
{In the trial run every value of Q, was higher than the corresponding tetrachoric 
coefficient. 


MELVIN D. DAVIDOFF AND HOWARD W. GOHEEN 117 


p the table with only the value of ad/bc (or its reciprocal if it is larger 
thus facilitating greatly the determination of ће ты, from a basic ыз 


table. This is achieved in the following manner: 
Divide numerator and denominator of the angle of (2) by V be: 


Оз = cos = = 
a 
Jg i 


То change from radian measure to degrees, multiply the angle by 180/7: 


= Ci == · 
Qs 05 hd 33 (3) 
bc 
For ease in eonstruction of the table, the following transformation was made: 
‚= = are cos Оз, 
aa 
ђе "РА 
180 _,у_ [ай 
аге eos Qs be? 
= ( 180 _ 1) а ай А 
аге cos Qs ~ be (9 


Equation (4) was used in constructing the attached table. 
To use the table, set the data up in the fourfold table as indicated 
h the value ad/bc or its reciprocal (whichever is 


earlier. Enter the table wit 
the larger) and read its corresponding 7... value. ]f the table is entered 
with the reciprocal, the sign of the resultant ты, Will be negative. Since 


the accuracy of the values given for тш does not extend beyond the second 
decimal, interpolation between the values listed for ad/bc is not recommended. 


The Accuracy of Qs as an Estimate of Trot 

In this section we compare the actual tet » the value found in the Ches- 

ire, Saffir, and Thurstone diagrams (ray), and Qs The procedure used in 

this checking is as follows: Се 
Various marginal totals are assumed 25 indicated below and various 

actual re. values are also assumed. The marginal totals and assumed ге 

have been put into the Pearson formula for the actual Tist 

rhk Е TQ — Dk — 1) 

2 6 


жат bc) РУСО Bed 
A 
pg HUC 5 ЗЕЕ usn dii ana, 


118 


PSYCHOMETRIKA 


TABLE 1 


Pearson's Qs Estimates of те, for Various Values of ad/bc 


Tret ад ће Т ад /bc Ttet ad /bc 

.00 0-1.00 .85 2.49-2.55 70 8.50-8.90 
.01 1.01-1.03 .86 2.56-2.63 71 8.91-9.35 
.02 1.04-1.06 .87 2.64-2.71 .72 9.36-9.82 
.03 1.07-1.08 .88 2.72-2.79 73 9.83-10.33 
.04 1.09-1.11 .89 2.80-2.87 74 10.34-10.90 
.05 1.12-1.14 .40 2.88-2.96 .75 10.91-11.51 
.06 1.15-1.17 .41 2.97-3.05 76 11.52-12.16 
.07 1.18-1.20 .42 3.06-3.14 77 12.17-12.89 
.08 1.21-1.23 .43 3.15-3.24 .78 12.90-13.70 
.09 1.24-1.27 .44 3.25-3.34 .79 13.71-14.58 
.10 1.28-1.30 .45 3.35-3.45 .80 14.59-15.57 
alt 1.31-1.33 .46 3.46-3.56 81 15.58-16.65 
12 1.84-1.37 .47 3.57-3.08 .82 16.66-17.88 
.13 1.38-1.40 .48 3.69-3.80 .88 17.89-19.28 
.14 1.41-1.44 .49 3.81-3.92 .84 19.29-20.85 
.15 1.45-1.48 .50 3.93-4.06 .85 20.86-22.68 
.16 1.49-1.52 .51 4.07-4.20 .86 22.69-24.76 
A7 1.53-1.56 .52 4.21-4.34 .87 24.77-27.22 
.18 1.57-1.60 „58 4.35-4.49 .88 27.23-30.09 
.19 1.61-1.64 .54 4.50-4.06 .89 30.10-33.60 
.20 1.65-1.69 .55 4.67-4.82 .90 33.61-37.79 
.21 1.70-1.73 .56 4.83-4.99 .91 37.80-43.06 
122 1.74-1.78 .57 5.00-5.18 .92 43.07-49.83 
.28 1.79-1.83 .58 5.19-5.38 .93 49.84-58.79 
.24 1.84-1.88 .59 5.39-5.59 .94 58.80-70.95 
.25 1.89-1.93 .60 5.60-5.80 .95 70.96-89.01 
.26 1.94-1.98 .61 5.81-6.03 .96 89.02-117.54 
227 1.99-2.04 .62 6.04-6.28 .97 117.55-169.67 
.28 2.05-2.10 .63 6.29-6.54 .98 — 169.68-293.12 
.29 2.11-2.15 .04 6.55-6.81 .99 293.13-923.97 
.30 2.16-2.22 .65 6.82-7.10 1.00 923.98 — 
.81 2.23-2.28 .66 7.11-7.42 

.82 2.29-2.34 .67 7.43-7.75 

.33 2.35-2.41 . 68 7.76-8.11 

.34 2.42-2.48 .69 8.12-8.49 


MELVIN D. DAVIDOFF AND HOWARD W. GOHEEN 119 


In actual use the left side of the equation was reduced as in Peters 
" TAE 5 adi— ibe: <= 
and Van Voorhis (7, 369) to NZZ Use of proportions in the cells of 


the fourfold table eliminates the № in the denominator. (ad — bc) is the 
found from the equation. All the cell frequencies (in proportions) of ie 
fourfold table can then be found from our knowledge of (ad — bc), (a + b) 
and (a + c). Using these cell frequencies we were able to obtain the Cortes d- 
ing values of ть and бз. + 
The marginal totals listed in the left-hand column of Table 2 give unique 
values of the tetrachoric. Those listed in the right-hand column yield =. 's 
identical with the corresponding marginal totals in the left-hand odor 
They come about merely as a result of one of the following conditions: ; 


column reflection, 


8. 
b. row reflection, 
c. row and column reflection, 
d. interchange of (a + b) and (a 4- c). 
TABLE 2 
Marginal values Marginal values yielding tetrachorics 
studied (Feet's) identical with the correspond- 
© ing set of values on the left 
«+0 (+o 
2 2 rhs акеле po pe 101.38 .8, .2 Brg 
2 SB рать eS 8, .8 WE 3, 2 
9, б 75,12. 7.558 
KAEA % 
2 JB. P кеннен de ionge 8,5 -5 8 5, 2 
3 di sam aeons НИЛИ 2, 3 3,.7 Ту. эй 
3 EEUU ues 7, .5 5, 7 5, .8 
5 5 


Originally Table 3 was get up on the basis of three terms of the series. 
Subsequently it occurred to us to check what changes would ensue if we used 
four terms, and the operation Wa$ repeated. Among the values in the present 
table, based on four terms of the series, 8 changes occurred. In all cases these 

(.01 or .02) in the values of а, b, c, d. 


changes were due to very small changes \ value 
These very small changes, due in most instances to rounding in the second 


decimal place, caused some uctuation in both Qs and тт, about ri... Obvi- 


120 PSYCHOMETRIKA 


TABLE 3* 


Tet=-2 те=.8 ти=.5 n —.7 т = .8 Trt = -9 


atbatc a b с d 


тть Өз тть Өз та Qs тть Qs ть Qs ть Qs 


.06 .14 .14 .66 .23 .27 
.07 .13 .13 .67 .34 .38 
2 .2 .09 .11 .11 .69 .58 .57 
11 .09 .09 .71 .68 .72 
.13 .07 .07 .73 .82 .84 
.15 .05 .05 .75 
.08 .12 .22 .58 .19 .22 
.09 .11 .21 .59 .30 .32 
.2 .8.12 .08 .18 .62 .54 .57 
.14 .06 .16 .64 .68 .71 
.16 .04 .14 .66 .81 .83 
‚18 .02 .12 .68 .91 .93 


.12 .08 .38 .42 .17 .20 
.13 .07 .37 .43 .28 .30 
.2 .5 .16 .04 .34 .46 .53 .59 
.18 .02 .32 .48 .72 .78 
-19 ,01 .81 .49 .82 .88 
.20 0.30 .50 


.12 .18 .18 .52 .24 .25 
.19 AT al? «58 -31 .33 
8 .3 .16 .14 .14 .56 .52 .54 
AS ui DIL .59 ‚70.71 
‚21 .09 .09 .61 ,80 .81 
.23 .07 .07 .63 .87 .88 


.18 .12 .32 .38 .21 .22 
.19 .11 .31 .39 .30 .30 
.З .5 .22 .08 .28 .42 .49 .51 | 
.25 .05 .25 .45 Fra 
.27 .03 .23 .47 .80 „84 
.29 .01 .21 .49 .90 .91 
.28 .22 .22 .28 .18 .19 
.30 .20 .20 .30 .81 .81 
.5 .5 .33 .17 .17 .38 .48 .48 
.37 .13 .13 .37 .68 .68 
.89 .11 .11 .89 SI 


.41 .09 .09 .41 .84 .84 


уо was always assumed to be positive. The accuracy checks are, however, 


perfect] izable 
to negative values. жна 


^ 


~ 


др 


MELVIN D. DAVIDOFF AND HOWARD W. GOHEEN 121 


ously, however, these changes were quite unreliable (because of the small 
frequency changes causing them). No changes were noted that would cause 
judgment of the level of accuracy of the tetrachoric esti- 
alue in the table indicated by a dash (—) came about 
The actual issue in this table, however, is 


any change in the 
mates involved. The у 
because of a zero cell frequency. 


the comparison of Qs and ть. 
Q; and rrn are generally in close agreement. When they differ, Q; always 


seems to be greater than Га + As would be expected, the best agreement 


seems to be at a .5, .5 split. 


REFERENCES 
nd Thurstone, L. L. Computing diagrams for the tetrachoric 
correlation coefficient. Chicago: Univ. of Chicago Bookstore, 1933. 
Goheen, Н. W., and Kavruck, 5. А worksheet. for tetrachorie 7 and standard error bt 
tetrachoric r using Hayes diagrams and tables. Psychometrika, 1948, 13, 279-280. 
Hamilton, M. Nomogram for the tetrachoric correlation coefficient. Psychometrika, 1948, 


13, 259-269. 
4. Hayes, 5. P., Jr. Diagr: 


1. Chesire, L., Saffir, M., а 


w 


g 


ams for computing tetrachorie correlation coefficients from per- 
centage differences. Psychometrika, 1946, 11, 163-172. 
Jenkins, W. L. A single chart for tetrachoric r. Educ. psychol. Meas., 1950, 10, 142-144. 
. Pearson, K. Mathematical contribution to the theory of evolution, VII. On the correla- 
tion of characters not quantitatively measurable. London: Philos. Trans. roy. Soc., 


e 


195A, 1901. = 
Peters, C. C., and Van Voorhis, W. R. Statistical procedures and their mathematical 
bases. New York: MeGraw-Hill, 1941. 


y 


Manuscript received 7/18/52 


Revised manuscript received 11/2/52 


га 


. method of item analysis sv 


PSYCHOMETRIKA—VOL. 18, No. 2 
JUNE, 1953 


A SPECIAL REVIEW OF 
HAROLD GULLIKSEN, THEORY ОЁ MENTAL TESTS* 


Louis GUTTMAN 


THE ISRAEL INSTITUTE OF APPLIED SOCIAL RESEARCH 


The most recent effort to integrate the sprawling statistical literature 1 
mental testing is Theory of Mental Tests, by Harold Gulliksen. First ше 
coverage of the book will be sketched, then treatment of the topics analyzed 
Of the twenty main chapters, this reviewer classified twelve as being devoted. 
primarily to reliability, four primarily to validity, three to scoring techni 
and one to item analysis. Don раў 
Reliability theory 18 introduced by adaptations of earlier algebraic treat- 
ments, and various conventional formulas ensue. Practical measures for mak- 
ing the nee lescribed and criticized. The reliability of 


ded observations are ¢ 
speeded tests was first studied mathematically by Gulliksen himself, and his 
а , 
approach is desc 


ribed in one of the chapters. 
Validity is treated from the points of view of test length and group 
attention is given to the case of selected populations 
nd/or covariances for the entire population. Dis- 
“incidental” selection; lack of this 


heterogeneity. Unusual 
and to estimated variances а 
tinction is made between “explicit” and 


care has caused mistakes in previous literature. 
A theoretical framework is least apparent in the last four chapters, on 
, 


scoring and on item analysis. This is the state of the extant literature: There 
has been virtually no attempt at a coherent theory for these topics. Item 
analysis is summed up as follows: “The striking characteristic of nearly all the 
methods described [by earlier authors] is that no theory is presented showing 


the relationship between т reliability of the total test and the 


the validity о | 
ulliksen's treatment is intended 


iggested” (p. 363). G 


is lack. К К 
f the book сап be summarized in seven major points: 


y is based on the notion of "parallel" tests. 
ique definition of the reliability of any given 
test and hence cannot serve as the pasis for a universal theory of reliability 

(b) No distinction is made between the algebraic consequences of а. 
ferent concepts of reliability. This creates inconsistencies between and within 
the concepts and the algebra pre 
& Sons, 1950; xix 
123 


to pave the way to fill th 
Our basic criticisms 0 

1 (а) The theory of reliabilit; 
This notion does not lead to а un 


sented. 


*New York: John Wiley + 486 pp., $6.00. 


124 PSYCHOMETRIKA 


(c) Retest theory* defines the most universal kind of error, and all other 
theories introduce additional, variously specialized, notions of deviation. 
Hence retest coefficients are upper bounds to all other types of reliability 
coefficients. The book perpetuates the interpretation that retest coefficients 
are "spuriously" large, instead of pointing out that this larger size must 
theoretically hold if all hypotheses are satisfied. 

(d) Only one full-fledged excursion is made into modern statistical theory 
—in connection with Wilks’ statistical test of padallelism of alternate forms. 
There is an incomplete excursion with respect to analysis of variance in 
Chapter 5; otherwise, old algebraic formulations are retained, with resulting 
inconsistencies in formulas. Most of the practical sampling problems of 
reliability and validity are not mentioned. 

(e) In general, reliability and validity are diseussed in terms of test 
length. This implies that only a single universe of content is being studied 
and one which has a certain kind of structure. But most prediction problems 
in testing involve several universes of content and more complex structures. 

(f) Whereas an exact multivariate analysis is presented for the param- 
eters of multivariate selection, only bivariate techniques are advocated for 
item analysis for weighting problems that equally require а multivariate 
treatment. 

(g) The basie data of most mental tests are qualitative; yet no treatment 
is given of the theory of such qualitative data. Instead, an attempt is made +0 
adapt to qualitative items least-squares theory appropriate to quantitative 
items. 

Our analysis will be divided into two main parts, one on the problem of 
reliability and one on validity, scoring, and item analysis. 


Reliability Theory 
1. The Formulation in Terms of Parallel Tests 


A general problem in a testing program is to avoid having the test ques- 
tions leak out in advance. One solution is to prepare two or three forms having 
the same content, so that groups tested on different, days will get different 
forms. It is of interest, then, to know to what extent the various forms are 
comparable. 

*We shall mean by this what Cronbach calls “hypothetical retest with zero time.” 
Tf a test is actually repeated twice on the same population, then each trial has its own 
retest coefficient, since the situation may change between trials. The kind of coefficient, we 
call here “retest” implies no change in situation. “Хо change" can always be iuasaniteal 
by making but one empirical trial under the conditions of interest; then One can use а 
formula for computing a lower bound to the retest coefficient. Lower bound formulas give 
correct information about what would happen in an infinite number of trials ШИГ, 
changed conditions, and fortunately require but a single empiric: ў id 


al trial for calculati ч 
statisties (3). ing their 


LOUIS GUTTMAN 125 


This problem has been rephrased by many writers in an attempt to 
construct a theory of reliability. The book before us gives what seems to be 
the most coherent of such approaches to reliability. 

With respect to the book's definition of parallelism, it can be agreed that 
sets of tests can be constructed and people can be found such that the equa- 
tions of Chapter 3 can be satisfied. Tests are parallel if they have common 
means, variances, and intercorrelation coefficients. It is not so easy to see, 
however, that the definition is unique. It seems to this reviewer that one could 
find the same test to belong to more than one set of parallel tests and thus in 
general to have more than one “reliability coefficient." ` 

Consider the following example of a series of "parallel" tests. Let test 1 
consist of but a single item: “Write down all the words you can think of that 
begin with the letter /." For a given population, and a given time limit, the 
score for each person is the number of words ће writes down beginning with f. 

There are at least two different directions in which one could go to con- 
struct tests parallel to this one. One direction is to vary the letter involved. For 
example, test 2 could be: “Write down all the words you can think of that 
begin with p,” while test 3 could use instead the letter d, say. By adjusting 
the time limits, all three tests can be made to have the same mean. There 
seems no absolute barrier to their also having common variances and corre- 
lation coefficients. For our particular population, let us suppose the three 
tests are actually parallel, and that their common correlation coefficient is 70. 
Then, according to the book's theory, test 1 has reliability coefficient .70. 

Another direction in which we could have gone to construct tests parallel 
to test 1 is to vary the places of the letter, and not the letter itself. Thus, test 2 
could be: “Write down all the words you can think of in which the second letter 
is 4” and test 3 could ask for ¢ as the third letter. Again, for our population, 
there is no physical bar to the tests turning out to be parallel. But this time, 
let us assume that the mutual intercorrelations turn out to be equal to .60. 


Then test 1 has retiability .60. ee ae 
Pherefor has reliabilities .70 and .60 simultaneously, according 
Therefore, test 1 ted, too, that Gulliksen's nonalge- 


to the theory of parallelism. It should be no S е : 
braie requirements are also satisfied simultaneously by each series. These 


additional requirements are "the tests should contain items dealing with the 
” һу 
same subject matter, items of the same format, etc. (рр. 173f). 


2. The Relationship to Retest Reliability Theory 
Even were the book’s approach to yield a unique coefficient for some 


phenomena, this coefficient in general would differ from that ensuing from the 
test-retest theory. We can ask: What would happen if we were to repeat two 
parallel tests on the same population under the same conditions with no 
memory factor involved? Thus, each form is to have an experimentally inde- 


pendent retest. 


126 PSYCHOMETRIKA 


Gulliksen's proposed coefficient will here be called the “communality 
coefficient,” since its hypotheses are a specialized version of Spearman’s for 
his single-common-factor theory. Both parallel tests have (by definition) the 
same communality coefficient, and this is in general less than each of the retest 
coefficients (regardless of whether the latter are mutually equal or not). This 
is a direct consequence of the formulas of retest theory (3), and has been long 
known in common-factor theory. 

We are interested in the reliability of a test, because it gives information 
as to the limitations the test may have in predicting other variable 
information as to how well the test can possibly be predicted from other 
variables. The retest coefficient gives precisely these types of information, 
assuming only experimental independence between criterion and predictor, 
and only the retest approach has this generality, 


s and yields 


3. Relationship to the Analysis of Variance 


Consider the problem of experimental error in the 
Hoyt (4) and others have shown that if the parallelism requirements of equal- 
ity of means, variances, and covariances are satisfied, then the resulting vari- 
ance of errors of unreliability is precisely the residual or experimental error 
in the sense of analysis of variance, Indeed, the parallelism equations provide 
the special case where both common-factor theory and analysis of variance 
are identical. If the equations are not satisfied, then one might go on to a Spear- 
man analysis or more generally to a multiple-factor analysis. If the equations 
are satisfied, one can continue in the standard sense of analysis of variance, 

The size of a residual error depends 


analysis of variance. 


and seek additional sources of variation. 
in large part upon the sources of variation included in the analysis. The book 
itself later on indicates the reader of a test as a source of variation; many 
other sources can be studied along the usual lines of analysis of variance— 
Provided the equations can be extended to hold for the additional sources— 
thereby reducing the experimental error. However, experimental error can 
never be reduced beyond that obtained by a strict replication or retest under 
precisely identical conditions. 

That the book’s reliability theory is aimed at the problem of universal 
predictability is stated explicitly for the first time in Chapter 14: “Tn addition, 
parallel tests should have equal validities for predicting any criterion” (р, 181). 
Votaw's sampling theory of compound symmetry is discussed in this connec- 
tion. Now, we have already seen that a test can belong simultaneously to 
more than one parallel set. In such a case, it clearly can have different validi- 
ties from any test parallel to it in a given set. Hence, uniform validities for 
any criterion can not be expected. Votaw’s sampling theory is appropriate to 
the administrative problem that originally gave rise to the concept of parallel 
tests: Given two forms and a particular criterion, can we interchange the 
forms to avoid cheating and yet obtain comparable results? А that is needed 
is comparable validity, which can be attained without parallelism, 


| 


LOUIS GUTTMAN 127 


4. On Some of the Algebraic Derivations 
4 to some of the specific algebraic derivations, the following 
points were noted which seem to require amendments. 

Chapter 4 carefully distinguishes between errors of measurement and 
errors of prediction, but earlier—in Chapter 2—these two seem to have been 
confused. Chapter 2 begins with three variables: X or the observed value, 7' or 
the true value, and Е or the error. It is assumed that 


X:=T;+ Е: , (1) 


where the subscript denotes the ith respondent. These three variables can 
also conveniently be regarded to be in deviate form, with zero means, and are 
denoted then by lower case letters 2, 1, and e, respectively. The book states 
that “по assertion regarding probability can be made" for estimating true 
scores from observed scores (p. 19). Since it is the population of respondents 
that is involved, however, “probability” has the direct meaning of being the 
proportion of the testees with true scores in the specified interval. The point 
estimate /' of a true score {from an observed score x can be given in deviate 


form by the following regression: 
8 
t т = „2 д = 7..1, (9) 


With respec 


an be calculated as 


€ 
and its standard error of estimate c 
8. == – 7) 6e (8) 
Instead of (2), the book has erroneously implied that 
(4) 


"= 21, 

instead of (3). 

heory of reliability concerned with 
» Chapter 2 is restricted to 


the variance 8; 
1 of an actual t 

, 
5 of measurement. 


Му = "те = "EE 7 0. (5) 
Conditions (5) permit only a theory of errors of prediction, and not of errors of 
measurement. Equation (4) cannot be derived from (0 and (5). A wider frame 
of reference is needed involving a universe of experiments as well as a popula- 
: 2 " 

tion of respondents. | р 1 
Chapter 3 does develop à real reliability theory for а given test, using an 
infinite universe of parallel tests 25 the frame of reference. Its “communality” 
reliability coefficient depends on the situation, the population, and—as pointed 
out above—the universe of content. ied d 
The Kuder-Richardson approach, oie ui 

an adequate frame of reference. This can be $ 


and has erroneously used 

Equation (4) is typica 
what are usually called "error 
equation (1), with the conditions that 


o in Chapter 16, also lacks 
mmediately from the mere 


128 PSYCHOMETRIKA 


fact that it gives no formula for a test composed of but a single item. Chapter 
16 uses the Jackson-Ferguson derivation of the Kuder-Richardson "formula 
20," which uses an assumption that begins in mid-air, with no means of testing 
its mathematical consistency. Within the frame of reference of retest theory, 
it is easy to prove that the Jackson-Ferguson assumption is false in general. 

While the theory of parallel tests is an actual reliability theory, in the 
sense of equation (4) above, it has a further limitation in not leading directly 
to a solution to the problem of speeded tests. Gulliksen has been the first to 
tackle systematically this latter problem, and his conclusions are admittedly 
tentative. To this reviewer, it appears that the argument developed in Chapter 
17 does not flow directly out of the framework of parallelism, but that some 
additional hypotheses have been inserted whose consistency is not clear. In 
particular, the algebra and concepts seem to get blurred with the introduction 
of “split-half estimates of reliability” on the bottom of page 234, leaving the 
justification of the final formulas in doubt. 


Prediction, Scoring, and Item Analysis 


5. Тће Need for Modern Statistical Theory 

The Preface to Theory of Mental Tests indicates that the book is directed 
to readers who already possess familiarity with elementary statistics and with 
tests of significance (p. vii). The list of symbols (p. xi) carefully distingui. hes 
between sample statisties and population parameters. This distinction seems 
not to have been made, however, from the outset of the algebra in Chapter 2 
and throughout the book, except for Chapter 14. 

From a pedagogical point of view, it is doubtful procedure to teach 
students that “over a sufficiently large number of cases the average error [is] 
zero . . .. In actual practice however, it is customary to assume [a zero average] 
for any partieular sample that is being considered" (pp. 627). 

The book's treatment would have been consistent had it been confined to 
parameters based on an indefinitely large population. 'Then sampling problems 
could have been treated as they should be, along the style of Chapter 14. 

In some chapters following the special problem of Chapter 14, the need is 
pointed out for a real sampling theory, the case being stated especially well 
with respect to current hodge-podge item-analysis techniques. For other 
problems, it is not made so clear to what extent their solution depends upon 
а sampling theory. 

It is especially with multivariate prediction problems that there is great 
danger in disregarding sampling errors for samples of the size so often used in 
practice—say 100 to 300 cases. Gulliksen cites the example wherein a multiple 
regression with a correlation coefficient of .73 in one sample of 150 cases yielded 
a correlation of zero in a second identical kind of sample. Simple scoring in 
this case held up better from sample to sample, yielding a correlation of 
about .25 in both instances. 


0S 


LOUIS GUTTMAN 199 


T he same danger attends the problem of multivariate selection in Chapt 

10. When is a simple scoring procedure better than the one advocated Са 
Unless we know something about the answer to this question, it is doubtful 
whether the multivariate formulas should be used in practice. Тће same holds, 


perhaps to a lesser extent, for the scoring techniques for maximizing the 


“internal consistency” and "reliability" of composites. 
ће uncertainty as to whether samples or populations are meant arises 
again with respect to the discussion of percentiles versus standard and other 
scores in Chapter 19. 
Sampling problems of the kind mentioned here are largely in the province 
tional mathematical statistics. They are numerous and important; 


of conven 
tatisticians will be encouraged to tackle 


it is to be hoped that mathematical 5 
them. 


6. Qualitative Data and Structural Analysis 

y by itself cannot solve many problems of predic- 
Conventional sampling problems concern the 
arge population. Mental test theory faces also 
m—that of selecting items from one or more 
tent. This is a basic problem of item analysis. 
е no solution without a structural 


Current sampling theor 
tion and external validity. 
selection of people from a 1 
another type of sampling proble: 
indefinitely large universes of con 
To this reviewer it appears that there can b 


theewy. 
Gulliksen develops one of the most rational of prevalent theories of item 
ational is shown by the fact that it reveals 


analysis in Chapter 21. That it is г 
some of its own shortcomings, which Gulliksen carefully points out. That it is 


not entirely rational is shown in part by the central role it gives the Kuder- 
Richardson formula (20), with the attendant confusion as to the meaning of 
"reliability" and whether or not an underestimate is involved instead of an 


estimate. In Chapter 16 it is stated that it has been “demonstrated that the 
20)] is a lower bound to the reliability coeffi- 


value given by [К-К formula (2 ilit 
cient” (p. 224). No cognizance is taken of this in Chapter 21, nor is it stated 
what theory of reliability is intended. a | 

Another shortcoming of the approach stems from restricting one's self 
only to average variances and covariances, and not to а structural theory of 
the intercorrelations. Consider the following example of what is known from 
linear multiple correlation theory. Let two predictors be correlated .60 with 


each other. Then it is better that the criterion correlations be .80 and .00 {һап 
perfect multiple correlation (— 1.00) 


that both be .40. The first case yields a 
with the criterion, while the second yields a multiple correlation of only .45. 
Clearly, just knowledge of the average intercorrelations of items with à 
criterion can be of little use in item selection. 

The example just described is borrowed from the theory of linear least 
squares. The same kind of argument holds, with perhaps even greater force, 
for the kind of data with whieh Chapter 21 deals. Gulliksen has limited him- 


130 PSYCHOMETRIKA 


self to the case of dichotomous items. It is not clear why he has not treated 
the data as the qualitative dichotomies they are, but instead has used linear 
least-squares theory. The role of the marginal distributions of each item 
separately should properly come into prominence here. Indeed, for the special 
structures of perfect scales and some kinds of quasi-scales, the marginals tell 
almost the whole story for item selection for prediction purposes,* for they 
determine the intercorrelations among the items. It seems clear that strue- 
tural theories like those already known for scales (and others now in prepara- 


tion for various kinds of non-scales) are needed before any coherent appro. 


ach 
can be had for item analysis, 


REFERENCES 


1. Gulliksen, Harold. Theory of mental tests. New York: John Wiley & Sons, 1950. 

2. Guttman, Louis. Multiple rectilinear рге 
Psychometrika, 1940, 5, 75-99. 

3. Guttman, Louis. A basis for analyzing test-retest reliability. Psychometrika, 1945, 10, 
255-282. 

4. Hoyt, Cyril. Test reliabil 
6, 153-160. 


dietion апа the resolution into components. 


ity estimated by the analysis of variance, Psychometrika, 1941, 


*This is to be distinguished from the purpose of 
analysis, or any other statistical technique, is not to be regarded as an appropriate method 
of item selection for defining content. But after the universe is defined, its Structure can 


be studied statistically and then items can be easily selected for efficient use for any given 
purpose. 


defining a universe of content. Scale 


Manuscript received 1/26/52 


Revised manuscript received 9/28/52 


PSYCHOMETRIKA—VOL. 18, No. 2 
JUNE, 1953 


COMMENTS ON GUTTMAN'S REVIEW OF 
THEORY OF MENTAL TESTS 


HAROLD GULLIKSEN 


EDUCATIONAL TESTING SERVICE 


Dr. Guttman's review of Theory of Mental Tests is essentially an 
attempt to indicate the main avenues along which he would like to see con- 
tributions made to test theory. 

My aim in writing Theory of Mental Tesis was to summarize the major 
areas of the literature in the field, to indicate some of the major areas for 
needed work, and to make some progress toward a unified theory. Guttman’s 
review indicates both that these objectives were fulfilled and that much still 
remains to be done. 

For purposes of this discussion the principal adverse comments on the 
nder three major criticisms. 

d primarily from the standpoint of parallel tests 
secondarily from the retest 


book will be grouped u 
1. Reliability is treate r 
whiċh cannot yield a unique coefficient and only 


viewpoint that has been developed by Guttman. 
2. Test theory has not been developed in terms of modern statistical 


theory. 
tem analysis and item selection are presented in terms of a bivariate 


ВЕТ 
theory rather than in terms of а multivariate structural theory such as 
Guttman's theory of scales and quasi-scales. 


Reliability 
The value of a “parallel form" reliability lies not only in dealing with 


actical problem of students becoming familiar in advance with the test 
Guttman implies, but also in the fact that it is the only feasible 
ted for dealing with reliability of pure speed tests and partly 


the pr 
questions, аз 
way yet sugges 


speeded tests. 
As to the lack of uniqueness, it seems to me quite appropriate that one 


important measure of reliability should vary with the skill of the test con- 
structor or with his idea of what he is measuring. 'The ability to construct 
parallel tests is an important one and should not be lost sight of in the statisti- 


cal mazes of analysis of variance. 
For example, suppose we have two tests, one of verbal reasoning and one 


of spatial visualization, both with “retest coefficients" of .95. We find, however, 
en parallel forms are constructed the two verbal reasoning tests corre- 


131 


that wh 


132 PSYCHOMETRIKA 


late .93 while the two spatial tests correlate .71; clearly this fact indicates an 
important difference between the two fields or between our grasp of the two 
fields which calls for further investigation. The "parallel form" reliability is 
а fundamental concept not only from the practical but also from the theoret- 
ical viewpoint. 

The assumption utilized by Jackson and Ferguson in their development 
of the Kuder-Richardson theory is asserted by Guttman to be “demonstrably 
false in general.” In reality, however, this assumption is false only in the 
sense in which any approximation is false. The point which should be em- 
phasized is that under many conditions often encountered in testing work 


the assumption discussed above gives a usable and valuable approximation 
to the reliability of the test. 


Statistical Theory 


In his Section 4, “On Some of the Algebraic Derivations,” Guttman 
presents his formula (2) for the estimate of a true score and formula (3) for the 
standard error of this estimate. These formulas are equivalent respectively 
to formulas (21) and (24) which I presented in Chapter 4. However, this 
least-squares approach did not seem to me to be worth developing in greater 
detail then, since à more thorough reconsideration of the foundations of test 
theory was in preparation by Frederie Lord.* 

In dealing with the problems of multivariate selection, I made a start 
at developing invariant parameters. This approach seems to me a better one 
than the “correction for restriction of range,” “correction for attenuation,” 
ete. The beginnings of such an approach are overlooked in Guttman’s com- 
ments. I would feel that this approach should eventually supplant the various 
so-called “corrections.” 

Guttman’s suggestion that the book should have stated that the treat- 
ment in general was for “parameters based on an indefinitely large population” 
(italics mine) is an excellent one which provides a uniform and accurate pro- 
cedure for presenting theoretical material in a field where sampling theory is 
still to be developed. As Guttman points out, the solution for sampling prob- 
lems may then be introduced wherever it is available. 

I am definitely in agreement with the view that there is a need for a 
development of test theory more closely related to modern statistical 
Many of these statistical sampling problems have now been indicated in vari- 
ous places, including Theory of Mental Tests and Guttman’s present review. 
The need for such work has been amply stated. My hope now is that those who 
are competent in mathematical statisties will aid in advancing test, theory 
not only by indicating that psychologists have not yet solved these problems 
but also by presenting solutions to some of the problems that h 
pointed out. | 


theory. 


ave been 


*Lord, Frederie M., A theory of test scores. Psychometric Monograph N. 
graph Хо. 7, 1952 


| 


| 
\ 


Teac 


HAROLD GULLIKSEN 133 
Пет Analysis ji 

Guttman criticizes my approach to item analysis as being essentially a 
“bivariate and quantitative" theory rather than a structural and qualitative 
theory. It should be pointed out again that the theory I developed follows 
basically from the procedure of finding total score on a test by assigning à 
“2? or a “1” to the answers and adding the “15” to find the total score. Ав 
long as this procedure is followed it seems appropriate to use а quantitative 
approach and inappropriate to insist that the item dichotomies are really 
qualitative. The complete multivariate theory seems unfeasible at present, 
so various simplifying assumptions or approximations are used. I chose one 
of these and worked out some of the consequences. 

As to the comment that for perfect scales or quasi-scales the marginals 
tell almost the whole story and determine intercorrelations, this statement 
does not apply to the usual achievement or aptitude test situation. One would 
be badly misled in selecting achievement or aptitude test items on the basis 
of the Guttman sealing theory assumption that “marginals tell almost the 
whole story for item selection for prediction purposes, for they determine the 
intercorrelations among the items." This assumption, however, seems to have 
been used successfully for attitude and interest scales by the Education and 
Information Branch of the Adjutant General’s Office. 

It is my feeling that many different types of cases should be worked out 
so that some theory will be available for various situations. For any given 
situation one would use the assumption which seemed the nearest to that 
situation. The item-analysis view presented in Theory of Mental Tests is а 


usable procedure for aptitude and achievement tests. 


Manuscript received 2/18/58 


PSYCHOMETRIKA—VOL. 18, No. 2 
JUNE, 1953 


A FACTOR-ANALYTIC STUDY OF REASONING ABILITIES 


RussEL F. GREEN 
UNIVERSITY OF ROCHESTER 
J. P. Guitrorp, PauL В. CHRISTENSEN 
UNIVERSITY OF SOUTHERN CALIFORNIA 
AND 
ANDREW Г. Comrey 


UNIVERSITY OF CALIFORNIA AT LOS ANGELES 


of orthogonal rotations, revea 
factors were interpreted as 
ceptual speed, vis 


‚ reasoning tests were i x ng, logical reasoning, eduction 
i eduction of conceptual relations, eduction of conceptual 


lates, and symbol substitution. The logical-reasoning 


factor corresponds to what has been called deduction, but eduction of 
correlates is perhaps ¢ioser to an ability actually to make deductions. The 
area called induction appears to resolve into three eduction-of-relations 
factors. Reasoning factors do not appear always to transcend the type of 


test material used. 


of perceptual relations, 
patterns, eduction of corre 


a series of studies designed to explore abilities con- 
sidered to be important in the success of high-level personnel* In this 
study an attempt was made to isolate and to define more precisely primary 
abilities in the domain of reasoning.[ The existence of several distinct 
reasoning factors is generally accepted, but their number and definitions 


are by no means clear. 


This is the first in 


Review of Recent Findings and Current Hypotheses 


L. L. Thurstone conducted one of the first. studies that brought out 
factors defined as reasoning (9). He defined an induction and a deduction 


*Under Contract N6onr-23810 with th 
а= 08 пене ТОН by the Office of Naval Research. These studies are under 
the general direction of J. P. Guilford. P. В. Chri d 
БРИ direct charge of this study during donis pu STEN Se DUE 
of its progress. i i 

fLack of space prevents our reporting all phases of this\atudy an: detail here. Hor 
more detailed information see (2, 4, 5 


135 


136 PSYCHOMETRIKA 


factor. He thought that induction items require the examinee to find rules 
or principles, whereas deductive items require him to apply rules or prin- 
ciples. He also tentatively proposed a restrictive-reasoning factor, with an 
arithmetic-reasoning test as its chief referent. He thought that such problems 
restrict the channels of reasoning by the conditions prescribed. 

Тће Army Air Forces Aviation Psychology Research Report No. 5 
describes several analyses leading to the conclusion that there are three 
reasoning factors (3). These were rather non-committally designated as 
reasoning I, II, and III. Reasoning I was best characterized by arithmetic- 
reasoning tests. It appeared with variances in nearly all reasoning, tests, 
and hence was called "general reasoning." It quite often crept into non- 
reasoning tests, especially when they became difficult for the examinees. 
Reasoning II was most characteristic of a figure-matrix test, of the type of 
Raven's Progressive Matrices. It bears some resemblance to Thurstone’s 
induction factor. Reasoning III was most characteristic of a figure-classifica- 
tion test in which one of five figures must be selected because it has certain 
properties in common with three other figures. No evidence of a deduction 
factor appeared in the AAF results, probably because there were no strongly 
definitive tests for it in the analyses. 

Blakey performed an analysis especially aimed at reasoning abilities (1). 
He found only two factors identified as reasoning: induction and deduction. 
His battery included only eleven tests, however, probably too few to define 
the whole domain of reasoning. 

Zimmerman re-rotated the reference axes of Thurstone’s initial study 
and analyzed the matrices involved in two other AAF studies (12). He 
identified three reasoning factors, one of which appeared to be a classifying 
ability represented by Thurstone’s Sound Grouping test and his Figure 
Classification test. This suggests the identification of the classification 
factor with the AAF reasoning ПТ. Thurstone’s restrictive factor was identified 
with the AAF reasoning I. Thurstone’s deduction factor was confirmed, 
but it was concluded that some change will probably be needed in its definition, 


Reasoning Hypotheses Postulated Jor This Study 


On the basis of previous findings and their implications, four reasoning 
factors were assumed to exist. They correspond to the three AAF factors 
plus Thurstone’s factor of deduction. Several different sub-hypotheses were 
set up as to the more precise nature of each of the four factors, These hypothe- 
ses and sub-hypotheses have served as the logical base for what was recognized 
as a continuing research program in which this study is an exploratory 
investigation. We hoped, in this study, to answer only a few questions: 
(1) whether four factors are sufficient in number to account. for the recognized 
domain of reasoning tests; (2) if so, whether the four correspond to the four 
expected; and (3) whether, if they did, some of the special hypotheses concern- 


UM 


СС“ С Ma—R———————— = = 


RUSSEL F. GREEN, ET AL. 137 


ing their properties are better supported than others. In spite of the fact that 
we were not able to test all sub-hypotheses in this study, we present a complete 
account of them as the framework for a program of research. From an opera- 
tional point of view, the sub-hypotheses served as starting points for test 
ideas. One aim was to diversify the types of reasoning tests as much as possible. 
The sub-hypotheses served well as a means to this end. 

Because of the involvement of reasoning I in so many tests, this factor 
might be a very general ability to manipulate symbolie material. А. some- 
what more restrietive hypothesis assumes that reasoning I is a general 
ability to solve problems. Not all thinking is problem solving. If it can be 
shown that this faetor is coextensive with tests that do pose problems, this 
second hypothesis would be supported. A still more restrictive hypothesis is 
that reasoning I is the ability to define, formulate, or structure given problems. 
This is an essential step in all arithmetic-reasoning items. The examinee 
must grasp the set of conditions and realize their inter-relationships and their 
contributions to the finding of a solution. A fourth conception of the factor 
is that it is an ability to test hypotheses. The previous conception—ability 
to define the problem—is a matter of forming hypotheses. Several false 
starts are typically made before the correct nature of the problem is grasped. 
The more quickly such errors can be rejected, the better the performance. 
Having correctly conceived the problem and having rejected wrong hypothe- 
ses, something more remains to be done. There must be a sequence of steps 
organized in order to arrive at the answer. A fifth hypothesis concerning 
reasoning I is accordingly that it is the ability to organize such a sequence 
of steps. Any one of the last three hypotheses could be supported if it turns 
out that tests which feature the kind of operation implied have higher loadings 
in reasoning I than has arithmetic reasoning. | А 

The favored hypothesis concerning the nature of reasoning п is that 
it is what Thurstone called induction; the ability to see rules or principles in 
а set. of objects. The concepts of “rules” and “principles” are rather general 
and perhaps more vague than is desired. There шаал оа as and 
principles. It may be that reasoning II can be more Dow 1 de за ЊЕ an 
ability to see certain kinds of rules and principles. ith this thoug nt in 
mind, and with the fact that à figure-matrix test is à typical measure of 
reasoning II, several more specifie hypotheses have been formulated. Тћеу 


take a more analytical view of the examinee $ task. | . 
ived by the examinee as presenting а 


А figure matrix may be concel х 3 
system. Each item presents à different, system which he must grasp. This 
idea places the emphasis upon the totality of the thing perceived. A second 
specific hypothesis is that reasoning TI is an ability to see trends in a series 

es along rows and along columns are 


of objects. In the matrix test, chang à ; 
progressive. A trend may also be regarded as à set of relationships, the same 
relationship being repeated between successive neighbors of a series. A 


138 PSYCHOMETRIKA 


hypothesis that reasoning II is a more general ability to see relationships is 
therefore suggested. This hypothesis is also suggested by the fact that reason- 
ing II has a component variance in figure-analogies tests. In such a test 
only pairs of objects are involved. It is possible to think of a trend being 
established by a series of two objects, but this is ordinarily conceived as the 
seeing of a relationship. In both figure-analogies tests and in figure-matrix 
tests, however, something more than seeing a relationship is required. The 
examinee must realize that the same identical relationship exists between 
other pairs or runs of objects. Another hypothesis, therefore, emphasizes 
the identifying of the same relationship in different settings. 

One additional hypothesis concerning reasoning П was considered in 
view of one fact. А Gottschaldt-figure test was found to have some variance 
in reasoning II in the AAF results. This test involves the perceptual analysis 
of a closed figure out of a larger closed figure in which it is embedded. If we 
consider what this test has in common with figure-matrix tests, the hypothesis 
suggested is that of a form-analysis ability of some kind, assuming that we 
can regard a figure matrix as a complex form that must be analyzed. Thur- 
stone has found an important perceptual factor in a Gottschaldt-figure test 
which he defined as closure against distracting material (10). It would be 
interesting to find that there is a closure factor common to both perceptual 
and symbolic material. Gestalt psychology would predict such abilities 
cutting across perceptual and thinking domains. 

Reasoning III is regarded as some kind of a classifying ability. Тће 
formation of a class idea depends upon seeing elements or properties that are 
common to a collection of objects. This act seems central and essential. 
Reasoning III may therefore be an ability to see common elements or proper- 
ties. Tests of reasoning III, however, have usually involved more than seeing 
Common properties. They have, for example, required the exclusion from a 
list of objects of one not having the common properties—a classifying (or de- 
classifying) act. The question also arises as to whether the ability is of a 
very general nature, extending to all kinds of objects, or whether it is confined 
to figures or to a limited number of kinds of objects. These contingencies 
have been provided for in two hypotheses. 

In the AAF results, reasoning III was a consistent contributor to vari- 
ance in figure-analogies tests. It is not very easy to see a classifying act in 
an item of that test, unless it comes in the final step. After the examinee has 
in mind the third figure and the relation to be fulfilled, the fourth figure is 
of a certain kind required to fulfill that relation. Which of the alternative 
answers has those properties, or falls in that class? This final act seems 
better described by the Spearman concept of “eduction of a correlate," 
however. This suggests the hypothesis that reasoning III is the ability to 


educe a correlate. This kind of act is not so clear in classification tests, but 
is tolerated here as а possibility. 


RUSSEL F. GREEN, ET AL. è 139 


The leading hypothesis for reasoning IV is that it is a general ability 
to draw correct inferences from premises. In other words, it is deduction. 
There is still the unsettled question, however, as to the generality of this 
factor. Previous results show that the factor’s leading variances are in formal, 
syllogistie tests. It still remains to be established whether more informal 
deductive tests will show as much variance. The rival hypothesis is that 
reasoning IV is merely a syllogistic-reasoning ability. 

The hypotheses may be summarized in the following outline for the sake 
of ready reference. In describing tests we will make references to the hypothe- 
sized abilities that are probably emphasized by those tests. З 


Reasoning II 
а. Seeing rules or principles (induction) 
b. Seeing systems 
c. Seeing trends 


Reasoning I 


a. Manipulating symbols 
b. Solving problems 


с. Defining problems 
d. Testing hypotheses d. Seeing relations (educing relations) 


e. Organizing a sequence of related e. Seeing identity of relationships 
steps f. Analyzing forms 


Reasoning IV 
а. Drawing inferences (deduction) 
b. Syllogistie reasoning 


Reasoning IIT 
a. Seeing common elements or 
-, properties 
b. Classifying (in general) 
е. Classifying forms 
d. Educing correlates 
The Tests 
ere constructed especially for this study.* In addition 
ations helped to determine the kinds of 
t consideration was that all 


Several tests w 
io the hypotheses. various consider; 


tests constructed. Possibly the most important co! th 
e short, containing from 10 to 25 items. 


the reasoning tests were made quit 3 : 
This conserves administration time and at the same time pt obably provides 
sufficient true variance for the purposes of analysis. All tests are group tests, 

involve multiple-choice items. 


most ar tially power tests, and most 
The iios bene Sant to be that reasoning factors transcend the kind of 
test material, but to test that idea we used different. kinds of material— 
forms, words, letters, and numbers—in different tests. A special effort was 
made to keep the apparent factorial complexity of most tests to а minimum. 

In this first study, the number and kinds of tests were insufficient to 


give us clear examination of all the sub-hypotheses. For some sub-hypotheses 
no unique tests were included in the battery, either because there was less 


" loped under à contract with the U.S. Navy Elec- 
*Many of these tests were devels lie monitorship of Arnold M. Small, during 1949. 


tronics Laboratory, San Diego, under t 


140 PSYCHOMETRIKA 


faith in those hypotheses or because suitable test ideas were not available, 
with the expectation that those sub-hypotheses would be investigated in 
subsequent studies. Among such sub-hypotheses were I, b, c, and e; II, 
b, e, and e; and III, a. This is not to say that the kinds of abilities described 
by these sub-hypotheses were not involved in any tests, for many of them 
were represented in combination with other hypothesized abilities, as Table 1 
will show. There was also the difficulty that after a test had been developed 
to measure one hypothesized ability we were forced to recognize that it 
probably also measured some other in our list. When a test is reported in 
Table 1 to be an expected measure of two or more abilities it does not mean 
that the abilities are necessarily of equal importance. It was hoped that two 
tests measuring the same pair of abilities, for example, would be slanted 
differently so as to effect separation in meaningful ways in the common- 
factor space. 

Two of the Blakey reasoning tests were included in the battery because 
they had proved to be good measures of factors in his analysis. Reference 
tests were included to define clearly the non-reasoning factors of verbal 
comprehension, numerical facility, perceptual speed, spatial orientation, and 
visualization, which were expected to occur in small degrees in various 
reasoning tests even though a special attempt to minimize these factors was 
made. Each test is described briefly in Table 1.* The hypothetical factor or 
factors that each test was designed or selected to measure are indicate? in 
the last column of Table 1 according to the following code: I (reasoning I— 
П, III, and IV have similar meanings); Та (reasoning I, hypothesis a— 
similarly for other hypotheses); N (numerical facility); P (perceptual speed); 
5 (spatial orientation); V (verbal comprehension); and Vz (visualization). 


Testing, Scoring, and Factoring 


The test battery was administered to 144 Officer Candidates at Lackland 
Air Force Base and to 139 Air Cadets at Randolph Air Force Dase, San 
Antonio, Texas.} A study of the age and educational levels of the two groups 
showed differences of less than two years. Other statistical comparisons of 
data from the two groups showed that we were justified in combining them 
for a single factor analysis. More details concerning the two samples will be 
found elsewhere (2, 5). 

The correlation coefficients computed were mostly Pearson product- 
moment 778. A few variables had been dichotomized, and for these either 
biserial r's or tetrachorie r's were computed as estimates of Pearson 7'8. 
The coefficients are given in Table 2. They are generally small but univer- 

Lia! жу + - ro тү z] I 
debted to ка сеу ало понгоз о teca are given remo @ 4). Wo 
to use tests 1, 25, 26, 27, and 29. Test 31 was designed by В. C. Wilson. 
— iwe те егу much indebted to Dr. John T Dailey, then Director, Directorate of 
E. te: ; Human Resources Research Center, Lackland Air Force Base, for 


g the testing arrangements and for other assistance in carrying out th i 
le an v "s ia 3 SMS TIE e testing. 
We are also indebted to Mr. William В. Lecznar, who assisted in many ways. s 


аге in- 
for permission 


f 


ул 


RUSSEL F. GREEN, ET AL. 141 


sally positive, ranging from .003 to .602. There are very few zero correlations, 
a fact which might suggest an oblique structure and which also promises 
diffieulty in achieving a unique orthogonal rotational solution. 

Estimates of reliabilities of the scores were based upon 100 cases chosen 
at random from the Cadet group. The Kuder-Richardson formula 20 was 
applied to all except the highly speeded tests and except for test 21, for 
which a split-half reliability was estimated. For speed tests 28 and 30, retest 
reliability estimates were obtained from secondary sources. The reliability 
estimates are reported in Table 2. 

Thirteen factors were extracted by Thurstone’s complete centroid 
method. The thirteenth factor was not rotated because its highest loadings 
were only about .16. The twelfth-factor residuals ranged from —.060 to 
+.071, with a distribution that was leptokurtie. The centroid factor matrix 
is given in Table 3. 

The reference axes were rotated independently by two individuals, 
hereafter designated as X and Y.* Both used the Zimmerman graphic, 
system of rotation (11). In the Y solution, the investigator was 
| by Thurstone’s objective criteria of positive manifold and 
simple structure but paid some attention also to psychological meaningfulness 
in terms of knowledge of the tests, of the hypotheses to be tested, and of 
sults with the familiar tests. The other investigator (in the X 
rotations) made a frank attempt to put axes through or near the familiar 
tests that have consistently defined known factors and to see how many of 
the reasoning hypotheses could be verified. Positive manifold and simple 
structure were achieved "more or less as by-products. The two rotational 
solutions gave interpretable factors, 11 of which could be matched in the 
two solutions. One in each solution stood alone and only one of these was 
wo rotated-factor matrices are given in Tables 4 and 5, 


orthogonal 
guided mostly 


previous те 


interpretable. The t 
Interpretation of the Factors 

the general order of their familiarity and 

ding in either of the solutions of .30 or 

tions. The loadings will be listed 

Tests are identified by number 


The factors are presented in 
definiteness. Any test having a loa 
higher will be considered in the interpreta 
according to solutions X and Y, respectively. 


and name. | | 
A. Verbal comprehension (V) x x 

10 Vocabulary Е 58 .60 

14 Inference .50 у 1 

4 Verbal Analogies І .48 „47 

13 Verbal Analogies П .42 28 

1 Sound Grouping 41 47 

12 Word Classification .96 .38 

.22 .30 


24 Correlate Completion | 
*One of the authors, В. Е. Green, and Wayne Ө. Zimmerman. 


142 


PSYCHOMETRIKA 


TABLE 1 


Summary of Test Requirements and Corresponding Hypotheses 


Test 


Task Required for Item 


Hypothesized 
Factor Content 


16. 


IT. 


18. 


6. 


. Sound Grouping . 
. Figure Classifieation 


. Letter Triangle 


. Verbal Analogies І . 


Figure Matching . 


Essential Operations 


. Remote Verbal 


Similarities 


. Prescribed Relations 
. Problem Solving . 
. Vocabulary* . 


. Figure Exclusion . 
. Word Classification 


Verbal Analogies II 


. Inference 


. Hidden Figures 


Number and Opera- i 


tions I 


Number and Opera- 
tions II . 


Number and Opera- 
tions III 


. Which one of five words sounds different? 
. Define classes of figures and assign other fig- 


ures to the correct classes . . . - 


. Find system in a triangular pattern of letters 


. Multiple-choice analogy, first pair difficult . 
. Select figure having most in common with 


given figure 


. Which information is not essential to the 


solution of the arithmetical reasoning 
problem? 


. Seleet word that has most in common with 


given word .. . 


‚ Select figure that embodies the stated changes 


of the given figure ... . 


. Solve arithmetic-reasoning problems 


. Indicate meaning of word presented in brief 


context .... 


. Which one of five figures does not belong? . 
. Which one of five class names does not belong? 


. Multiple-choice analogy, second pair difficult 
. Select correet one of five conclusions from 


given statement 


. Indieate which of five figures is contained in 


given figure 


. Which equation is true after a certain inter- 


change of signs or numbers is introduced? 


. What interchanges will make the equations 


tue! чи 


. First discover which interchange corrects 


main equation, then select an equation that 
is made true by that interehange . 


Шаһ 


Пас 
Ibe 


па 


Id, Шае 


Ice 


Шађ 


ша 
Тађеде | 


v 
Iac 

Шаһ 
ша 
IVa 


ТЕ 


Ibd 


Таса 


Iabed 


*Test incl 
est included as a reference test because of its known factor content. 


|| 
| ~ 


RUSSEL F. GREEN, ET AL. 143 


TABLE 1—Continued 


"Test 


Hypothesized 


Task Required for Item Factor Content 


19. 
20. 
21. 


22. 
23. 


24. 
25. 
26. 
27. 
28. 
29. 
30. 
31. 
32. 


33. 


34. 


Figure Matrix . . . . 


Word Matrix 


Ship Destination . . . 
Syllogisms ...-.- 
Figure Analogies . . . 
Completion 

Correlate Completion . 
Secret Writing... - 
Identical Forms* 

False Premises. . . - 
Numerical Operations* 


Punched Holes* . . - 
Perceptual Speed* . . 
Symbol Manipulation 
Form Reasoning . . - 


Circle Reasoning . . · 


Space Orientation 


*Test included аз a reference test be 


Discover system of changes in a 3x3 matrix 


Of figures изн Y aem ts ши је пье 
. Discover system of changes in а 2x3 matrix 

aD Words dee s sy na ied + sara У IIbede 
Find best port for ship, considering the influ- 

ences of several variables ....... Ta 
Select correct syllogistic conclusions. . . . IVb 
Draw figure which correctly completes figure 

па, ша 


апаву оаа Is ct АЕ а + а 
Complete last correlate of pair series 


Ilade, Ша 


Decode numbers representing letters Id, IIa 
Which form is the same as the one given? . P 
Is syllogism (nonsense type) true or false? . IVb 
Indicate correct answer in simple arithmetical 
computation. . 10 e + on n N 
Indicate pattern of holes in unfolded paper 
after it is punched while folded. . . . . Vz 
Which form is the same as the one given? . P 
Mark symbolically presented “If—Then” 
statements true or false . . . . + + + + IVab 
Solve simple equations given in terms of 
familiar forms ......., * Ta 
Discover rule for marked circle in patterns. Ibe 
LE. B 
Determine position in space from which pic- 
У», 8 


ture was taken 


cause of its known factor content. 


144 PSYCHOMETRIKA 


TABLE 2 | 
The Correlation Matrix* $ > 
ы 8 ва 2 x Е E z 
е 8 = O 2 y o T 8 = % E F 
с 2 3 $ B Е Bg x € v 5 
z £ EEEE сЕ S B 3 e E 
> п Й = E S g S = S з 
z Е = б A E = 56 BH % 
es 18) e =» ә ge = 8g &m 5€ S 
T. 231 252 401 177 264 185 300 258 452 126 230 339 349 246 203 26 
2. 281 224 272 147 138 210 271 304 209 192 195 248 17. 4 315 2 
3. 252 224 185 084 158 119 314 442 001 104 144 193 169 252 267 | 
4. 401 272 185 188 313 284 325 363 441 113 275 425 463 285 4 
5. 177 147 084 188 096 044 247 070 161 141 106 213 121 207 144 % 
6 264 138 158 313 096 167 261 354 252 154 110 223 307 123 205 
Mee 185 210 119 284 044 167 199 162 275 064 150 190 252 194 216 
8. 300 271 314 325 247 261 199 422 363 143 119 281 313 345 387 
9. 258 304 442 363 070 354 162 422 228 263 197 181 412 356 304 46 
10. 452 209 091 441 161 252 275 363 228 110 243 334 456 261 252 2 
T1. 126 192 194 113 141 154 064 143 263 110 042 280 098 209 234 133 
12. 239 195 144 275 106 110 150 119 197 243 042 176 269 048 178 126 
13. 339 248 193 425 213 223 190 281 181 334 280 176 414 208 287 253 
14. 349 174 169 463 121 307 252 313 412 456.098 269 414 250 270 325 
15. 246 144 252 255 207 123 194 345 356 261 209 048 208 250 265 305 
16. 203 315 267 285 144 205 216 387 304 252 234 178 287 270 265 439 
17. 260 286 331 355 123 257 157 393 469 242 133 126 253 325 305 430 
18. 274 279 322 326 192 278 219 376 393 205 152 134 247 305 288 433 496 
19.• 138 247 224 184 250 208 134 356 339 093 237 140 124 103 179 306 267 
20. 316 209 225 443 132 278 221 362 356 300 136 202 381 397 374 358 350 ! 
21. 133 095 250 151 083 210 040 255 372 131 155 105 066 131 077 118 236 uy 
22. 340 212 228 391 162 313 149 362 382 405 189 214 323 422 277 251 305 
23. 332 378 308 357 253 220 188 438 447 240 308 137 204 340 421 382 389 
24. 328 231 303 400 226 252 230 379 357 423 189 213 298 387 336 324 334 
25. 255 318 439 349 177 281 176 449 420 213 202 170 232 286 272 406 397 
26. 174 216 220 236 121 097 091 227 254 151 177 060 053 150 184 195 257 
27. 301 110 183 388 226 416 159 343 327 286 140 136 359 386 209 982 216 
28. 207 106 178 099 023 251 089 202 175 142 101 063 137 172 114 262 204 
29. 252 491 343 212 202 154 151 299 480 078 311 231 273 253 383 355 312 
30. 122 230 272 156 220 092 139 278 282 073 316 069 099 152 228 321 319 
81. 369 295 136 384 141 234 136 313 272 308 082 099 302 383 214 348 263 
32. 003 136 250 043 187 107 057 290 206 160 148 114 104 156 207 275 313 
33. 182 200 358 212 096 124 155 217 410 143 165 239 248 192 202 333 370 
84. 198 129 255 248 280 211 117 453 382 106 196 090 120 129 279 274 271 


* Decimal points have been omitted. 


RUSSEL F. GREEN, ET AL. 


TABLE 2, Continued 


145 


18. Number & Operations III 


у 207 359 236 305 389 334 


21. Ship Destination 
24. Correlate Completion 
25. Secret Writing 


19. Figure Matrix 
20. Word Matrix 


138 316 133 340 332 328 255 


9 247 209 095 212 378 231 318 


294 225 250 228 398 303 439 


5 184 443 151 391 357 400 349 
2 250 132 083 162 253 226 177 


208 278 210 313 229 252 281 
134 221 040 149 188 230 176 
356 302 255 362 438 379 449 
330 356 372 382 447 357 420 


5 003 300 131 405 240 423 213 


237 136 155 189 308 189 202 
140 202 105 214 137 213 170 
124 381 006 323 201 208 232 
103 397 131 422 340 387 286 
179 374 077 277 421 336 272 


306 358 118 251 382 324 


237 312 226 299 381 353 
232 357 229 325 221 
232 108 375 495 436 333 


357 108 164 199 161 


9 229 375 164 385 343 328 


325 495 199 385 441 416 
221 436 161 343 441 339 


5 408 333 207 328 416 339 


226 116 211 067 219 077 284 
249 364 210 413 318 332 298 
132 119 236 105 210 211 275 
267 338 081 202 491 255 358 
343 094 179 197 303 105 376 


118 304 142 378 285 250 303 
299 168 258 092 239 196 407 
212 221 263 170 216 320 391 
247 323 243 373 407 237 365 


*Decimal points have been omitted. 


25, Numerical Operations 


26. Identical Forms 
29. Punched Holes 


al Speed 


122 369 
230 295 
272 136 
156 384 
220 141 


4 092 234 


139 136 


9 278 313 


282 272 
073 308 


316 082 
069 099 
099 302 
152 383 
228 214 


348 
9 263 
227 
118 
304 


9 142 
378 
303 285 
165 250 
376 303 


9 550 148 


064 301 


9 224 263 


387 206 
075 


602 149 
293 127 
178 110 


2. Form Reasoning 
33. Circle Reasoning 


3: 


003 132 
136 200 
250 358 
043 212 
187 096 


107 124 
057 155 
290 217 
266 410 
160 143 


148 165 
114 239 
104 248 
156 192 
207 202 


275 333 
313 370 
283 271 
299 212 
068 221 


258 203 
092 170 
239 216 
196 320 
407 391 


383 250 
103 208 
288 348 
242 318 
602 293 


149 127 
346 

346 

225 364 


34. Spatial Orientation 


n 


Qeaws 


юю уын 
Saas 


r,, Reliabilities 


нове ~ 
557924 


146 


PSYCHOMETRIKA 


TABLE 3 


Centroid Factor Loadings and Communalities* 


A з © D Eg к wx m T мл € uw 3 

.1 506 315 —040 —073 051 —088 083 —248 —090 —028 072 030 456 
2 458 —053 151 —196 034 —089 214 045 —148 —101 142 093 392 

3 496 —223 133 147 —085 051 066 —150 —069 027 081 144 405 

4 587 387 058 —064 —084 088 —085 104 —123 —109 069 —108 578 

5 326 —023 —069 —143 295 076 —061 035 160 —182 —136 032 309 

6 443 196 —151 226 030 067 108 054 —096 086 —062 —092 358 

т 329 153 097 —093 —111 —064 018 083 —027 —112 —075 —101 195 

8 631 —049 —090 115 124 —157 —145 058 084 —160 068 034 525 

9 657 —119 110 210 —044 173 —051 —102 —174 130 176 133 643 

10 494 416 —122 —148 —104 —136 —136 —123 021 —163 —026 092 553 
11 354 —150 074—112 171 144 095 —070 —112 140 —238 065 322 
12 312 161 066 —090 —153 178 080 —044 048 —180 082 080 247 
13 490 294 102—166 102 076 160 131 107 116 —008 058 . 461 
14 555 400 —030 —059 —155 080 —122 050 052 179 056 172 588 
15 490 —032 144 —032 108 —142 —258 —112 047 123 —087 037 400 
16 582 —081 109 —029 —032 —187 145 203 142 064 028 —094, 490 
17 605 —117 102 095 —171 —114 —084 161 —075 092 117 —016 502 
18 578 —004 053 111 —205 —248 071 145 —075 052 —051 081 501 
19 472 —246 —143 078 119 072 ‘079 134 —005 —219 —120 037 429 
20 575 239 248 115 102 —046 —127 055 057 064 —034 —086 510 
21 371 —173 —228 280 —074 143 079 —130 —114 —109 —052 074 380 
22 556 268 —056 130 159 078—110) 045 —051 043 089 097 471 
23 667 —080 188 042 241 —125 —104 —043 —077 062 —052 071 530 
24 587 170 118 081 —066 —092 —107 —079 105 —064 —186 086 482 
25 652 —193 —062 119 —048 016 090 124 032—076 100 060 527 
26 400 —285 —257 —272 —125 050 —110 —069 —157 047 081 —244 510 
27 515 277 —108 222 149 134 033 104 061 108 —063 —098 500 
28 362 —065 —214 130 —191 —159 266 —124 142 216 —116 —100 436 
29 561 —250  372,—205 275 182 085 —058 065 054 151 090 737 
30 486 —467 —228 —351 —121 114 —143 112 —153 153 —088 —050 748 
31 473 227 —147 —080 069 —193 103 063 050 109 269 ogg 454 
32 446 —431 —306 —128 —224 031 —126 057 247 065—129 112 659 
33 501 —209 142 086 —322 218 113 —169 308 023 043 —158 637 
34 500 —174 032 186 274 140 —191 —136 171 —117 167 —182 569 


*Decimal points have been omitted. 


RUSSEL F. GREEN, ET AL. 


147 


TABLE 4 
Final Rotated Matrix-X* 

A S вен qui". n 
V N P V, $& GR LR PR CR CP EC SS № 
1 а 55 eüà № № 30 о Иос Ево 
2 9i 07 0 32 19 82 —02 —08 М 00 03 25. 895 
8-0 18 и 9» 10 369.18). Ш Om, ПОШ 
à d$ a o 05 16. 16 (28 Ш бо 09 550 
17 03 12 30 16 —0 01 35 —02 —05 12 05 313 
6 16 35 04 08 Of 14 38 201 а 03 00 —07 361 
7 95 05 10 09 06 04 OL 01 26 06 20 07 208 
8 07 17 10 14 28 16 20 35 16 03 30 28 528 
8 o ой в 19, Ш 9 За ора 646 
10 в ла: öl 02) 02 № ше 01 41 12 555 
iios бо м а 0 ПО EINE 0007 10 00 —10 329 
12 36 —02 —04 07 08. 155» 00 а 108 235 Ориз 2255 
13 42 15 o4 36 00 —05 28 04 19 03 02 13 467 
14 50 1 10 05 —03 0 47 =06 —01 19 15. 17 5 
15 (08 . бб 17. 21 16 (OMNEA 05 и м 40 10 402 
в d. Bo, 32. [29 A. АТ 84. 09, 1L 35 491 
1? бб лт 20 008 „ 18" 207 24. ко 35 19 17 34 509 
18 оз 27 19 п -00 28 `17 02 35 10 22 34 508 
19) 00 dad 2 22 MAL ОКО әз 00 0 12 431 
20 dy я вия 39 32 13 30 10 512 
ее Sai, 18 08. 02: 38 1088. А59 12 22 —02 —04 388 
= 7 0295 185 09 ми 

29 25 05 01 13 18 19 7 27 14 

23 00 10 т и 2 2 8 м d P A у 594 
21 22 16 04 2 -0 09 21 20 21 2 490 
Db. Oum 58, 20. eds! 22 ТО 26 20 02 33 535 
26 12 09 з —08 40 11 -02 = од 12 06 00 512 
27 з 5 о 09 10 00 5 3 26 03 01 —04 508 
28 04 59 15 —0 0 04 09 о 13 11 05 04 438 
55. is 100 p тг м 86y РО 01 27 00 20 736 
зо. 066 0 79 о 235 05 038 (05 09 13 01 16 749 
DEEST NUR ULNC IA PE 20 г о 
32 01 36 57 —02 09 —08 —01 25 —06 28 08 34 666 
88 Абды ово 09. 10. йу ШИ OD OS 22 67 01 14 646 
34 —07 "Ten CUM rex 08 27 15 01 579 


04 


* Decimal points have been omitted. 


148 


Final Rotated Matrix-Y * 


PSYCHOMETRIKA 


TABLE 5 


oak со 


© о ~ 


10 


А В с р Е Е G H J K L M 

V X P Yi & GR IR PR CP EC $8 ? т 
47 25 04 22 00 18 12 01 01 16 23 00 446 
21 04 15 37 —05 04 —02 20 24 —01 27 03 389 
04 15 08 31 13 33 —06 01 28 08 12 20 400 
ат —03 08 04 01 24 32 18 21 22 22 —07 570 
17 04 12 14 29 —03 04 35 —10 11 =02 04 302 
12 27 01 00 05 34 28 15 03 05 20 —04 351 
26 08 07 04 —04 03 12 15 23 14 04 —04 196 
15 13 14 10 18 19 03 32 07 36 29 24 512 
09 07 11 35 17 53 12 —02 24 14 23 18 630 
60 13 04 00 —01 03 20 10 04 24 14 19 546 
02 16 19 39 12 14 17 15 03 —05 -—12 02 317 
38 —03 —02 06 13 12 03 06 23 —05 08 02 245 
28 13 00 24 17 —04 42 20 16 —01 16 —04 452 

и 03 00 07 13 14 48 —04 16 1 23 21 57 
09 11 14 29 13 06 19 04 08 41 02 19, 395 
01 22 10 15 12 —02 17 25 38 17 32 06 485 
01 06 21 13 01 25 17 10 ,37 25 29 20 491 
04 24 11 18 —1] 17 17 19 37 16 24 28 491 
03 13 21 13 11 29 —04 46 06 00 07 14 415 
17 10 —06 20 14 15 33 18 21 40 17 —04 499 
06 24 08 01 10 47 —10 14 04 —04 04 19 378 
26 04 —03 16 16 30 31 16 —04 21 29 10 466 
07 16 14 46 12 19 16 24 10 37 16 14 588 
30 23 —04 14 12 13 22 18 23 32 02 20 473 
07 15 20 13 20 30 02 28 28 07 31 21 516 
12 06 66 OL —05 18 —01 —01 08 06 07 05 504 
15 24 —02 —02 23 30 38 21 —03 12 24 —05 489 
00 56 14 —06 08 08 10 —03 19 —02 15 12 432 
00 —09 24 н 39 04 02 п 21 и 12 —02 731 
—02 00 75 14 10 16 14 12 М —04 —03 26 740 
26 14 09 12 06 —02 19 06 02 08 53 12 452 
—02 16 46 —07 32 06 03 14 20 —02 —02 49 648 
10 24 16 02 42 22 —04 -—07 55 07 01 02 628 
08 06 15 15 47 30 —10 12 01 38 16 —05 553 


*Decimal points have been omitted. 


~ 


ша 


RUSSEL F. GREEN, ET AL. 149 


omprehension factor. The list of tests is 
headed, as usual, by the Vocabulary test, and it contains most of the verbal 
tests in the battery (seven of the ten). In the construction of verbal tests not 
list, efforts һай been made to minimize the verbal-factor 


This is clearly the verbal-c 


appearing in the 
variance, apparently with some success. 
The verbal variance is higher than e 
ences and Sound Grouping. In the former, not enough attention was paid to 
abulary simple, for such words as “martyr,” “allegiance,” and 
The loading in the Sound Grouping test is supported 
a loading of .30 (12). It may be that the inter- 


pretation of the verbal factor should play down the-term “comprehension” 
or should even substitute the label “verbal knowledge,” for the Sound 
Grouping test requires only that words be known well enough to pronounce 


them. It is possible, however, that learning meanings and pronunciations go 
together and therefore remain correlated. 


xpected in two of the tests, Infer- 


keeping voc 
"privileges" appear. 
by Zimmerman's finding of 


B. Numerical facility (N)* x Y 
28 Numerical Operations .59 .56 

16 Number and Operations Changes I .29 .22 

.28 .14 


31 Symbol Manipulation 


wility factor. On the whole, the 
ted but in such small quan- 
respect. Their low loadings, 
through the Numerical 


This is the well-defined numerical-fz 
appeared in tests where ехрес 
re of no help in rotations in this 
forcing of an axis 


number variances 
tities that they we 
however, help to make 
Operations test. 


reasonable the 


C. Perceptual speed (P) x Y 
30 Perceptual Speed .79 „75 

32 Form Reasoning .57 .46 

26 Identical Forms .58 .66 
.81 .19 


11 Figure Exclusion 


This is the visual, pereeptual-speed factor. The two reference tests 
attery to account for this factor are high in the list. 

: Reasoning came as по surprise. In spite 
iis test to be factorially pure on what he 
it looks like à perceptual-speed test, 
items are во casy that the task is à 


ometrie forms. 


put into the b 
The high loadings for Form 
of the fact that Blakey had found tl 
thought was a reasoning factor (1), 
at least for superior examinces. The 
matter of rapid identification of simple ge 
inimum loading of .30 by extending the 


"кни Tas ај le of à m бе: d 
*Here we have liberalized the er to avoid basing interpretation on a single test. 


lower limit of listed tests to -28 in ori 


150 PSYCHOMETRIKA 


D. Visualization (Vz) X УД 
29 Punched Holes . 64 . 64 

23 Figure Analogies Completion .44 46 

11 Figure Exclusion 41 .39 

13 Verbal Analogies II -36 .24 

2 Figure Classification .32 .97 

5 Figure Matching .30 14 

3 Letter Triangle „22 :81 

9 Problem Solving .19 .85 


This is the visualization factor that was distinguished from a space 
factor in the Army Air Force research (3). It is most heavily weighted in 
Thurstone’s Punched Holes test, which was put into the batt ery to identify it. 

The visualization factor is defined as an ability to manipulate or trans- 
form a pictorially or verbally given object into another visual arrangement. 
Its appearance in a verbal test is not new. It has shown small variances in 
such tests as Reading Comprehension and Arithmetic Reasoning. It is 
apparently of general utility in problems that require reasoning. This does 
not make it a reasoning factor; it is probably merely an aid in problem solving. 
It is more likely to come into play in connection with pictorially presented 
problems, as attested by the preponderance of figural tests in the list. 


E. Spatial orientation (Б) a У 
34 Spatial Orientation .50 .47 
26 Identical Forms ` .40 —.05 
29 Punched Holes .36 .39 
33 Circle Reasoning .19 .42 
32 Form Reasoning .09 ‚82 


This factor is probably the same as that called “spatial relations," S, , 
in the Army Air Force research (3). Beyond the first and third tests in the 
list, the factor loadings of the two solutions differ more than usual. Circle 
Reasoning might well be expected to have some of this spatial-factor variance, 
. as it has in the Y solution. in Form Reasoning the order of the forms as to 
right-left is a feature of the items. There is no reason to expect variance of 
this faetor in Identical Forms. From these points of view, the Y rotation 
comes closer to expectations. 

The factor 5, has come more and more to mean the ability to appreciate 
the spatial order or arrangement of objeets, with the observer's own body 
as the frame of reference (7). The name “spatial orientation” is used here in 


оа to “spatial relations” because it more aptly describes an adaptive 
ability. 


| —— | 


—à 


з= 


| 


„= 


E 


RUSSEL F. GREEN, ET AL. 151 


F. General reasoning (GR) X Y 
9 Problem Solving .43 Eod 
3 Letter Triangle .36 33 

21 Ship Destination .38 En 
2 Figure Classification .32 . 04 
1 Sound Grouping .30 .13 

25 Secret Writing 24 .30 

22 Syllogisms Tests 19 .30 
6 Essential Operations .14 .94 

34 Spatial Orientation .04 30 

00 30 


27 False Premises 


'The two solutions agree as to the three leading tests on this factor. 
The factor is evidently reasoning I or the general-reasoning factor defined 
in the Air Force research (3). An arithmetic-reasoning test, like Problem 


Solving, was the chief defining variable. As usual, it has loadings in a rather 


wide variety of tests. 

One of the systematic differences between the two solutions is that the 
Y list includes tests such as False Premises and the Syllogism test that are 
distinetive of the group under factor G, defined as logical reasoning. In 
Zimmerman's re-rotation of Thurstone’s data from his primary-mental- 
abilities study, he identified the general-reasoning factor but did not find 
any of the logical-type tests loaded on it (12). False Premises and another 
syllogism test were in Thurstone’s battery. Thus Zimmerman's solution on 
the Thurstone battery is'consistent with the X solution on our reasoning 


battery. 

This apparent separation o 
ing as we have here in factors F 
hypothesis concerning factor F, genera 


will be based primarily on the X rotations. 
Test 21, Ship Destination, was originally constructed as a measure of 
; the assumption that the ability is a matter 


reasoning I (general reasoning) on AM : hi 
of symbol manipulation. Some of the other tests on this factor, however, 
do not fit the symbol-manipulation hypothesis very well, for example, Figure 
Classification and Sound Grouping: hat ming I seems characteristi 
It has been observed before (3) that reaso Д s characteristic 
of tests whose items cover a wide range of difficulty. Tests that have rather 
homogeneous and low levels of difficulty are commonly lacking in this factor. 
It is true that some tests with wide range of difficulty do not have variance 
in this factor. Is there any difference between these tests and others graded 
- the factor general reasoning? 


in difficulty which do have variance in th ~ 
dos vM Ben part, tests of graded difficulty that have no variance in 
Л 


this factor are of a more logical type Most of the tests that appear on this 


f logical from non-logical processes in reason- 
and G, will be stressed in forming a new 
] reasoning. Comments from here on 


152 PSYCHOMETRIKA 


factor involve the manipulation of abstract symbols that have little realistic 
meaning. As problems get harder, trial-and-error manipulation may reason- 
ably be assumed to become more important. This factor, then, may represent 
а less logical manipulation of symbols in a trial-and-error fashion. This 
would explain why difficult items in almost any type of test introduce some 
general-reasoning variance. This conception is very close to Heidbreder's 
idea of “spectator behavior" (6). It may also be akin to Tolman’s concept 
of УТЕ (vicarious trial and error). Effective trial-and-error approach would 
presumably be most dependent upon the speed with which new solutions 
are tried and on the ease with which failing solutions are rejected. Such 
behavior is likely to be common to most of the leading tests in the list for 
this factor. 

Other hypotheses as to the nature of reasoning I can still be entertained. 
A hypothesis of symbolie span, that is, the ability to manipulate simul- 
taneously a large number of symbols or to apprehend a more complex pattern 
of symbols, has much merit. A speed-of-symbol-manipulation hypothesis 
could also be entertained, since effectiveness of trial and error would depend 
upon how efficiently the wrong trials are handled. 


G. Logical reasoning (LR) X bj 
14 Inference Test „47 48 
22 Syllogism Test AT 81 
27 False Premises .45 S8 = 
20 Word Matrix .39 .33 
9 Problem Solving j .39 „12 
6 Essential Operations 88 .28 
31 Symbol Manipulation · .32 19 
23 Figure Analogies Completion .31 .16 
13 Verbal Analogies I .28 82 


This is probably the same factor that has been defined as deduction by 
other investigators. All three of the strictly logical, syllogistie type of tests 
are conspicuously loaded on it. 

, , Presumably the chief process in deductive reasoning is the drawing of 
inferences or conclusions. But note that the tests that have been used to 
define the so-called deductive factor have been of the multiple-choice or 
true-false form. The examinee does not have to draw his own conclusion. 
Conclusions are given to him and what he must do is to decide which one is 
correct. Deciding about the correctness of a conclusion may be a different 
psychological process than producing the conclusion. It may rather be an 
Bee GE judgment or evaluation. The criterion of evaluation in these tests is 
that of logical necessity. We might describe this factor as a sensitivity 10 
logical necessity. The name we have chosen, “logical reasoning,” is of broader 
connotation to allow for other possible descriptions. It remains to be seen 


| 


RUSSEL F. GREEN, ET AL. 158 


whether deduction tests in the form of completion items will give rise to an 
additional factor distinct from this one. If they do, it will be interesting to 
sce whether such a faetor is identical with our factor IX (eduction of cor- 


relates) to be discussed later. 


H. Eduction of perceptual relations* (PR) x У 
19 Figure Matrix .41 .46 

34 Spatial Orientation ‚88 12 

E .96 .14 


21 Ship Destination 
5 Figure Matching 858 .35 
8 Prescribed Relations .85 „82 
т False Premises .32 21 


The leading test in this list, Figure Matrix, suggests that this factor is 
the same as the Air Force reasoning II, in which the same kind of test was 
also a leader (3). In the Air Force analysis a figure-analogies test was about 
equally loaded in the factor. In our battery we had no orthodox figure- 
analogies test, so we do not have this possibility of confirmation of the 
identity of the factor. Prescribed Relations, however, is a variant form of 
the figure-analogies test. | TA 

Our main hypothesis JI stressed seeing relationships, systems, trends, 
or patterns. The many tests that were thought to be related by communality 
in such a factor divided three ways in our analysis, factor H representing 
one of them. We then had the task of distinguishing the three groups of 
tests, under factors H, i, and J. Under factor H are perceptual tests, with 
the exception of False Premises, which has а loading of .32 in the X solution. 
The three tests on which the two solutions agree definitely involve the 
comparison of figures and the noting of relationships. We therefore name 
this factor “eduction of perceptual relations.” We would lay stress on the 
contrast between this group of tests and the next, Vw cena : ана 
Figure Matrix test goes into the one group and the Y * е ead 
similar in form, but differing in content, goes into апо her, 1$ а Most striking 


finding. 
І. Eduction of conceptual relations (CR) X 
4 Verbal Analogies I .38 
18 Number and Operations Changes П1 .35 
17 Number and Operations Changes II .34 
16 Number and Operations Changes I ү 
20 Word Matrix : : 


G Essential Operations 
ге have found it desirable to use the term 


2 ы ames W А ч азе vo зр 
a In this and in three other cd Spearman in connection with his principles of 
eduction,” a term given prominene! y 
cognition (8). 


154 PSYCHOMETRIKA 


This factor was identified in solution X only. It constitutes a rather 
meaningful picture, however, so we will attempt to interpret it. All the tests 
except possibly Essential Operations involve the grasping of relationships. 
In Verbal Analogies I an attempt had been made to emphasize this very 
process. Verbal Analogies II is not in the list, as we expected, since the 
seeing of relationships in it was made as easy as possible. Number and Opera- 
tions Changes I was not expected in such a list but it may require more 
relational thinking than was anticipated. Since the relationships involved 
in these tests, by contrast to those for factor H, are of verbal and numerical 
types, the factor has been named “eduction of conceptual relations." 

This is the first instance that we know of in factor-analysis results where 
reasoning factors separate along the lines of the material content of the items 
as they do in factors H and I. This finding is contrary to the belief that 
reasoning abilities transcend the kind of material about which one reasons. 
It does, however, lend some support to the distinction sometimes made 
between concrete and abstract reasoning. On the other hand, none of our 
results supports the distinctions sometimes made between verbal reasoning 
and numerical or quantitative reasoning. Both types of relationships are 
involved in the tests loaded with this factor. 


J. Eduction of conceptual patterns (CP) X A 
33 Circle Reasoning .67 .55 
9 Problem Solving .97 .24 
3 Letter Triangle «35 .28 
16 Number and Operations Changes I .09 .38 
17 Number and Operations Changes II .19 .87 
18 Number and Operations Changes III .10 .81 


The Circle Reasoning test and the Letter Triangle test both require 
the examinee to find a rule or system. A seeing-rules or а seeing-systems 
hypothesis best fits factor J. It is probably the same as Thurstone's induction 
factor, which he defines as the ability to see rules or principles. 

There are no strictly perceptual tests in the list, where properties of 
visual objects determine their relationships. The patterns conceived are 
conceptual rather than perceptual, hence the term “conceptual” in the 
factor name. Since we have proposed a distinction between seeing perceptual 
relationships and seeing conceptual relationships in factors H and I, one 
might expect by analogy two factors for the eduction of patterns. This is a 
possibility to be explored in future investigations. At any rate, there is some 
indication in factor J that the formation of patterns is something more than 
or is different from the eduction of relationships. 

The presence of Problem Solving in this list of tests is worthy of comment. 
It has been known that arithmetic-reasoning tests are factorially complex 


LN 


ý 
K> 


RUSSEL F. GREEN, ET AL. 155 


and that there is much true variance in Problem Solving still to be accounted 
for. Finding that an ability of educing a pattern is inyolved in this test is 
not unreasonable. Hypothesis Ie, defining problems (or structuring problems), 
was proposed as one conception of reasoning I, the leading component of 
the variance in that test. Structuring an arithmetical problem preparatory 
to its solution is a form of production of a conceptual pattern. One puzzling 
consideration, however, is the fact that Essential Operations, which was 
designed to examine hypothesis Ie, did not come out on factor J. It may be 
that the form of that test is not suitable for detecting variance in this factor 
and that some other form of item will be needed to isolate from the arith- 
metic-reasoning test that particular step of defining the problem. 

The two solutions are in disagreement over the presence of the three 
Number and Operations Changes on this factor. If the Y solution is correct, 
since equations are involved in these tests, it may be that grasping an equation 
as a structure is the key to the presence of this factor. Reorganizing an 
equation involves the formation of a new structure of a kind that is not unlike 


the structuring needed in arithmetic-reasoning items. 


K. Eduction of correlates (EC) x Y 
24 Correlate Completion Al 295 

10 Vocabulary 41 .24 

15 Hidden Figures .40 41 

23 Figure Analogies Completion .32 ‚87 

8 Prescribed Relations .30 .36 

20 Word Matrix .30 .40 
‚15 ‚38 


34 Spatial Orientation 


o see whether a factor of this kind would 
Prescribed Relations, and Verbal 
the list above. It is possible to 
was not. The other two are of the com- 
e has to produce an answer. Verbal Ana- 
hich answers are given. Correlate 
letion form. It is probable 
for the production of an 
which we have called 


Three tests were constructed t 
e Analogies Completion, 


emerge: Figur І 
wo of these are 1n 


Analogies II. The first t 
explain why Verbal Analogies II 
pletion type, in which the examine 
logies II is a multiple-choice test in whi 
Completion is essentially an analogies test in comp 
that almost any form of analogies test that calls 
answer will be substantially weighted with this factor, 


“eduction of correlates.” р 

'The presence of some other tests in the list calls for some attempts at 
explanation. Vocabulary tests are usually of complexity one for the verbal- 
comprehension factor, though some occasionally have secondary variances 
in reasoning abilities. The vocabulary test that we used presents each word 
to be defined in a brief context. Tt is possible that the context furnishes just 
enough in the form of relationships їп some instances to make possible a 


156 PSYCHOMETRIKA 


correct answer when the word itself is unknown. Very liberal time was allowed 
for the examinees to complete this test, a condition favorable for the use of 
such secondary cues. 

There is no obvious explanation for the presence of the Hidden Figures 
test in this list. In the Air Force results it had shown an affinity for the test 
Figure Matrix, but in this analysis it separated from that test. In Thurstone’s 
analysis of perception, this test helped to define a new factor defined as the 
ability to effect closure against distractions (10). That factor accounted for 
only a portion of its true variance, however, so there is room for other variance. 
It is not likely that our factor IX is the same as Thurstone’s closure factor. 

It should be mentioned that in the X rotations there seemed to be 
some genuine correlation between factor K and factor G, logical reasoning. 
Such a correlation could arise from the fact that no judgment as to the logical 
soundness of a conclusion can be made unless a conclusion has been educed. 
It is possible that an inference test in completion form would measure this 
factor of eduction of correlates. 


L. Symbol substitution (SS) X Y 
16 Number and Operations Changes I ES wae 
17 Number and Operations Changes II .84 .29 
18 Number and Operations Changes III .34 .24 
32 Form Reasoning 34 —.02 
25 Secret Writing .33 .31 
31 Symbol Manipulation .28 .58 


The most obvious thing that these tests have in common is the need for 
substituting one symbol for another. It is therefore hypothesized as a sym- 
bol-substitution ability. In every one of the tests in this list, there is a 
substitution of symbols according to rules. The symbols substituted take 
on the meanings or the functions of the symbols they replace. 

The factor might be of a more general nature than is demonstrated in 
these results. It might include all thinking in which new symbols are assigned, 
whether they replace old ones or not. Further research will be needed to 
examine the generality of this factor. It will be of interest, too, to determine 
relationships of this ability to performance in certain branches of m 
matics and in symbolic logic. 


athe- 


M. (Unidentified) P4 
32 Form Reasoning .49 
18 Number and Operations Changes П .28 
30 Perceptual Speed .26 


No hypothesis regarding the possible nature of this factor is offered. 


It might possibly be the factor that Blakey had found defined uniquely by 


RUSSEL F. GREEN, ET AL. 157 


the Form Reasoning test. It seems likely, however, that the variance given 
this factor by the Y solution belongs in factor L. The facility in switching 
symbol meanings is an obvious feature of this test. 


Discussion 


be made to relate the factors to the hypotheses 


An attempt will now 
vious factorial results. 


set forth in preparation for this study and to pre 
There are some general issues, also, that call for comment. 

Among the twelve obtained factors, seven are common to tests that 
soning tests. Whether all of these should be regarded 
as reasoning abilities is a matter of definition of the domain of reasoning. 

Faetors were found corresponding to the four major hypotheses but not 
The well-established general-reasoning factor (factor 
hypotheses concerning its properties that 


were designed as rea 


on a one-to-one basis. 
F) was substantiated and new 
should be fruitful for future research were mentioned. 

The area described as reasoning П (in general, the inductive area) 
„sions to describe it factorially. These are factors 
ations and patterns. 
ribed as a classifying ability (hy- 


seems to require three dimer 
H, I, and J, which involve educing rel 

No factor was found that could be dese | 
pothesis III). The classification tests had no common factor unique to them. 


Since non-reasoning factors account for the larger part of their variances, it 
would seem that classifying activities are not very much dependent upon 
reasoning. An exception to this is the involvement of general reasoning and 
this is probably true only when the classifying task becomes difficult. It 
appears that classifying tests have almost nothing to offer in the way of 


measurement of reasoning abilities. 


To meet the fourth major hypothesis, we ow orme 
been called “deduction,” but which in our opinion should be called “logical 


reasoning." It is suggested that this is an ability to evaluate inferences and 
: ges је еб 
may not be the same ability at all as that for drawing conclusions for one's 


self. 

Next will be considered briefl 
were upheld and which ones were n 
Some сап be regarded as favored by the rest 
discarding. Most of them need further invest! dme 

No results particularly favored hypotheses Ia, manipu ating зу mbols, 
Ib, solving problems, or Id, testing hypotheses, as definitions of reasoning I. 
If the factor called “eduction of conceptual patterns 18 sustained as defined 
here, reasoning I is not a matter of hypothesis Ic, defining problems. We 
had no good tests in the battery for testing Ie, organizing à sequence of steps. 
The two tests in which such a factor might have emerged seiten in the 
analysis. This negative finding is not sufficient reason for rejecting this 


hypothesis without further investigation. 


found a factor that has f ormerly 


h of the initial minor hypotheses 
h of the four groups in turn. 
others as candidates for 


y whie 
ot, taking eac 
esults, 
gation. 


158 PSYCHOMETRIKA 


Our factor J, eduction of conceptual patterns, is essentially in line with 
the idea of hypothesis IIa, seeing rules or principles. This kind of pattern is 
meaningful and it may be based upon various kinds of material—forms, 
letters, or numbers. The distinction between a rule or principle, IIa, a system, 
IIb, or a trend, Пе, may be actually insignificant from a psyehological point 
of view. We had in the battery no really discriminating tests which would 
have made possible a clear separation of such abilities, if they are separate. 
The tests designed to examine the hypothesis of seeing (educing) relations, 
IId, divided into at least two groups—those involving figures in one group 
and those involving numbers and words in the other. The consequence was 
the identifying of two factors, H and I, educing perceptual relations and 
educing conceptual relations. Finally, there were no tests sufficiently unique 
to do justice to the hypothesis of seeing identity of relationships, Пе, and no 
factor, even the one appearing in the test Hidden Figures, could be identified 
as IIf, analyzing forms. 

As was stated above, no classification factor of any kind emerged. We 
failed, then, to find any support for hypotheses IITa, seeing common elements 
or properties, IIIb, classifying in general, or Пе, classifying forms. This 
might be because seeing identities is a matter of seeing relationships, in the 
sense that identity is a limiting case of relationship. The failure to find a 
factor of this kind is somewhat curious in that traditional psychology has 
made so much of the processes of abstraction, equivalent stimuli, and transfer 
by reason of “identical elements." A factor that could be described as ша, 
eduction of correlates, was demonstrated. It did not occur in classification 
tests. It did occur to best advantage in completion tests rather than in 
multiple-choice tests. It is possible that if any factor is to receive the label 
“deduction” this should be it. It is suggested, however, that the term “de- 
duction" be dropped as a psychological concept. 

Тће tests designed to examine hypothesis IVa, drawing inferences 
(deduction), would have led to the support of this idea over that of IV b, 
syllogistic reasoning, which restricts the supposed ability to formal, syllo- 
gistic tasks. The reason is that all the tests, formalized or not, came out 
together on a factor. The factor was not designated as deduction, how 
but as “logical reasoning"—for reasons cited under factor С. 

One obtained factor was not anticipated by any of the initial hypotheses, 
that of "symbol substitution." Although the facile attachment and replace- 
ment of symbols in connection with ideas is undoubtedly a feature of steps 
in the solving of certain types of problems, such activity would hardly be 
described as a species of reasoning. It would, however, come undi 
concept of thinking. 

A question of a more general nature that we hoped would be answered 
to some degree by this study should be mentioned. It appears that the 
reasoning factors do not always transcend material lines, There is no evidence 


ever, 


er the general 


on A 


RUSSEL F. GREEN, ET AL. 159 


of a distinetion between verbal and numerical reasoning, but there is evidence 
of a distinction between concrete and abstract reasoning in the two factors 
eduction of perceptual relations (concrete reasoning) and eduction of con- 
ceptual relations (abstract reasoning). That is as far as the distinction goes, 
so far as our evidence is concerned. We may be justified, however, in looking 
for a factor of “eduction of perceptual patterns” as a possible parallel to the 
obtained factor of eduction of conceptual patterns. 

Which of the reasoning factors are identifiable with those previously 
found? In answering this question, we will consider only the major studies 
ost attention in the planning of this investigation. General 


that were given m ў 7 
hurstone’s restrictive reasoning and the Army 


reasoning corresponds to Т е 1 
Air Force reasoning I. Logical reasoning parallels Thurstone’s and Zimmer- 


man's deduction, but a significantly different interpretation has been given 


to it; 

Thurstone’s induction factor may either be regarded as having been 
analyzed in three new dimensions—eduction of perceptual relations, eduction 
of conceptual relations, and eduction of conceptual patterns—or it may be 
regarded as the same factor as the last of these three. The latter also has 
some resemblance to the Air Force reasoning HI. The eduction of perceptual 
relations and the eduction of conceptual relations have no exact antecedents 
in previous factorial results, except that the former 18 probably equivalent 

Air Force reasoning II. 
k Ec P а and symbol substitution, as defined, have no 
parallels in. previous factorial findings. Eduction of correlates cae closest 
to what should be called deduction, but it would be a somewhat special 
case of deduction found in reasoning by analogy. It is our oe Realy 
that a genuine deduction factor pa not been found and that if it exists it 
will take completion tests to find it. E cu 

In ШЫП, we give passing attention to the па LT ВХ 
reasoning. We have been referring to several ior joo aim Е is 
Obviously, whether or not any eder дой ad with two definitions. 
definition of reasoning. We actually star ERO non AEN 
One was a logical definition that referred to ишш Па 
toward solving unfamiliar problems. The other was ап ee Е 
that referred implicitly to а collection of tests e. e 2n ; 
tests. We now have a third possibility. We can define Wee 4 У va о 
the faetors obtained from analyses LM лч * OR | 
which factors in the list, we с $ МЕ ~ Н 
to cover thet This statement rut eria ВВ и галос, 
It is our belief that if reasoning is to be Si ue. Lui sapi aud do 
Bive us an acceptable, operational ЫН M. r this until more is known 
pendability. We do not feel that the time is mpe 9 


about the factors. 


160 PSYCHOMETRIKA 


REFERENCES 


"e 


1. Blakey, R. I. A factor analysis of non-verbal reasoning tests. Educ. psychol. Measmt. 
1941, 1, 187-198. 

2. Green, R. F. A factor-analytic study of reasoning abilities. Ph.D. dissertation, Uni- 
versity of Southern California Library, 1951. 

3. Guilford, J. P. (Ed.) Printed Classification Tests. Army Air Forces Aviation Psychology 
Reports, Report No. 5, Washington, D.C.: Government Printing Office, 1947. 

4. Guilford, J. P., Comrey, A. L., Green, R. F., and Christensen, P. В. A factor-analytic 
study of reasoning abilities, I. Hypotheses and description of tests. Reports from the 
Psychological Laboratory, No. 1. The University of Southern California, 1950. 

5. Guilford, J. P., Green, В. F., and Christensen, Р. В. A factor-analytic study of reason- 
ing abilities, II. Administration of tests and analysis of results, Reports from the 
Psychological Laboratory, No. 3. The University of Southern California, 1951. 

. Heidbreder, E. An experimental study of thinking. Arch. Psychol., 1924, 11, No. 73. 
7. Michael, W. B., Zimmerman, W. S., and Guilford, J. P. An investigation of two 

hypotheses regarding the nature of spatial relations and visualization factors. Educ. 
psychol. Measmt., 1950, 10, 187-213. 
8. Spearman, C. The abilities of man. New York: The Macmillan Company, 1927. E 
9. Thurstone, L. L. Primary mental abilities. Psychometric Monographs No. 1. Chicago: 
Univ. Chicago Press, 1938. 
10. Thurstone, L. L. A factorial study of perception. Psychometric Monographs No. 4. 
Chicago: Univ. Chicago Press, 1944. 

11. Zimmerman, W. S. A simple graphical method for orthogonal rotation of axes. Psycho- 
metrika, 1946, 11, 51-55. 

12. Zimmerman, W. S. The isolation, definition, and measurement of spatial-visualization 
abilities. Ph.D. Dissertation, University of Southern California Library, 1949. 


> 


Manuscript received 9/5/52 


Revised manuscript received 11/7/52 


ЗА 


~ Т - 


PSYCHOMETRIKA—VOL. 18, NO. = 
JUNE, 1953 


A METHOD FOR FACTORING LARGE NUMBERS OF ITEMS 


Rosert J. WHERRY 
THE OHIO STATE UNIVERSITY 
AND 
Ben J. WINER 
UNIVERSITY OF NORTH CAROLINA 


The computation of intercorrelation matrices involving large numbers 
of variables and the subsequent factoring of these matrices present a formi- 
dable t 4 method for estimating factor loadings without computing the 
intercorrelation matrix is developed. The estimation procedure is derived 
from a theoretical model which is shown to be a special case of the multiple- 
group centroid method of factoring. Empirical checks have indicated that the 
model, even though it makes some stringent assumptions, can be applied to a 
variety of variables found in psychological factoring problems. It has been 


found to be particularly useful in factoring test items. 


Historical Introduction 


While some variant of the Thurstone group factoring method is quite 
satisfactory for factoring 15 or 30 or possibly even 50 tests or items, the 
method involves intercorrelation and residual tables of order п’, i.e., for 
200 items such tables would have 40,000 entries each. Thus large numbers 
of items make some other approach desirable if not absolutely necessary. 

Wherry and Gaylord in 1943 (10) suggested an iterative approach, 
based upon successively determined r,, values, for factoring items. "This 
iterative approach was used suecessfully in several major studies. Wherry 
(9) factored 292 rating scale descriptive phrases concerning Army officers 
in 1950. The iterative approach was also used in three doctoral disserta- 
tions at Ohio State University: Gordon (3) factored 300 personality test 
items in 1950; Phelps (7) factored 11+ need-activity items т 1951; and 


Lucas (5) factored 90 need-satisfaction items in 1951. . . 

At the American Psychological Association meetings In Chicago in 1951, 
Loevinger, et al. (4) and Gleser, et al. (2) presented a slightly different 
iterative approach, which involved the obtaining of the intercorrelations 
among items within various subtests in addition to several refined methods 
for testing and refining the clusters. Final clusters were apparently left 
oblique although intercorrelations were minimized. | . 

Wherry, Perloff, and Campbell (11) showed that item factors obtained 


161 


162 PSYCHOMETRIKA 


by the Wherry-Gaylord approach were the same as those obtained by factor- 
ing the intercorrelations among thirteen arbitrary subtests set up by expert 
opinion. The thirteen subtests had been selected in an attempt to speed 
up the original Wherry-Gaylord method, but had iterated back into only 
three patterns. The present paper grew out of further consideration of prin- 
ciples presented in that paper. 

In the autumn of 1950, two large item-factoring problems were begun 
at Ohio State. The Personnel Research Board was preparing to factor 120 
leadership description items. Wherry was preparing to analyze 300 teacher 
description items. The present method was worked out to achieve common 
factors without iteration and at the same time preserve any specific factors 
which might be found in any of the expert-constructed a priori subtests. 
A direct method of factoring items was worked out at that time by analogy 
with the traditional group factoring method for tests. 


The Direct Method of Factoring 


In the standard multiple-group centroid analysis, the table of inter- 
correlations of the basic variables is partitioned into groups which have 
relatively high within-group correlations and relatively low correlations 


with variables in other groups. Each of the groups (A, B, +++ , K, +++ М) 
defines a centroid reference vector in an oblique reference frame, These 
centroid reference vectors will be designated Xa , X5, о, Хк, coo, Хм. 


The matrix of oblique factor loadings is obtained from projections of the 
variables onto these centroid vectors. For those variables belonging to an 
arbitrary group К, the projections on centroid vector Xx would be given by 


hi. ar x Та 
УУ [7 + бк 
where the notation Sx indicates the sum of all elements in the matrix of 
intercorrelations of the variables in group K excepting the elements of the 


form r;; . For variables not in group K, the projections on centroid vector 
Xx are given by 


(1, jin K; i #9), (1a) 


Тек 


25 Tij 
Tir = —————— ; inot in К:5 і 
“ VS his + Se (то in K5j in K). (1b) 


The cosines of the angular separations of the centroid vectors are given 


>, Ут 
тс EM. S. УУ RSS шшш. 0 


by 


ROBERT J. WHERRY AND BEN J. WINER 163 


The process of transforming projections from oblique to orthogonal reference 
frame сап readily be interpreted as a part-correlation procedure. In carrying 
out the process, one selects an arbitrary vector as the pivot; after the ipit 
vector has been selected, the order in which the correlation between vectors, 
represented by the cosines of the angles between vectors, is “parted out" 
is arbitrary. The various solutions obtained by selecting an arbitrary vector 
as a pivot and arbitrary order thereafter are equivalent in the sense that 


one can be rotated into any other. 


Let us designate the orthogonal reference axes obtained by the trans- 


formation process as Ху, Хи, 777» Х,, +++, X, . Suppose we select reference 
vector X, as the pivot vector. The projections of the variables on the ortho- 
correlations: 


gonal axes could then be expressed by the following part- 


Ta ГАЗ 
Тит = Та(взА) > 
Tun = Тис-ав) › 


Tim ‘= TrABCH (М-1)) › 


where the notation fig. designates the part-correlation of variable 7 with 
variable j (1.е., the correlation of variable ї with that part of variable j which 
is independent of variables k and 0). A modified Doolittle solution applied 
to the matrix of the intercorrelations of the centroid reference vectors pro- 
vides a convenient method for computing the matrix for effecting this 
transformation. This method is outlined in a later section of this article. 


for Estimating Factor Loadings 

that only the sums of subsets 
r into the computations. It 
ain direct estimates of the 
do not involve using the 
been found to be particu- 
be described in the item- 
the estimation processes 
quite general in appli- 


An Indirect Method 
1b) it will be noted 
lation matrix ente 
actor method to obt 


In equations (1a) and ( 
of the ту; in the intercorre 
18 the purpose of the indirect Ё т 
sums needed in equations (1а) and (1b) which 
individual тү; . Since this method of factoring has 


larly useful in working with test items, it vill 
factoring setting. If the assumptions underlying 
аге met, the procedures to be described are, however, 
cation. $ Я = 
One starts the analysis with a large pool of items (ев. ze items). 
is of expert opinion. Fortu- 


The items are grouped into subtests on the bas "ori 
nately this expert opinion need not be too expert. One necessary restriction 


is that the experts do not achieve the rather impossible task of creating 


com»letely alternate forms. Since the purpose of the grouping of the items 
ite of this condition, subtests resulting 


by the experts is the exaét орроз 3 5 
from areas of agreement among the expert judgments will generally prove to 


164 PSYCHOMETRIKA 


be satisfactory as a starting point for the analysis. A second necessary con- 
dition is that the number of subtests set up be at least as great as the rank 
of the matrix of inter-item correlations. 

All items are then administered to а population of N persons (preferably 
at least 100), whose responses are to be analyzed. All papers are scored on 
each of the established subtests and the matrix of intercorrelations between 
the subtests computed. In addition one computes the correlations between 
each item and each of the subtests.* With limitations to be indicated, these 
steps provide a set of data directly analogous to that required by the multiple- 
group factor method. Designating the correlation of an item with an arbitrary 


subtest by тук, and the correlation between any two subtests by rj; 
have 


, we 


(4) 


In equation (4) the subtests are considered to define vectors analogous to 
centroids of groups. Converting the oblique subtest vectors into orthogonal 
coordinate axes, we have 


Tir = 


(5) 


"up E 
Tim == Tigana sitai S Tiia Bretay 


By means of equations (5) the correlations of items with correlated sub- 
tests are converted into factor loadings on a set of orthogonal reference 
vectors. The latter vectors can then be rotated to psychological meaning- 
fulness. 

Equations (4) and (5) are based upon the assumption that the vectors 
defined by the scores on the subtests are satisfactory estimates of the centroid 
vectors that would be defined by the centroids of the corresponding inter- 
item correlations. One apparent limitation of this assumption is that rir’ 
would reflect a spurious effect contributed by the correlation of an clement 
with a total of which it is a part and hence include the contribution of a 
specific factor. However, even with very erude corrections for this fact, 
the direct use of equations (4) and (5) yielded highly meaningful factors and 
loadings in the studies which have been indicated above. Methods for elim- 
mating this source of spurious correlation are developed in what follows. 

. Further considerations show two not so apparent limitations in the 
estimation procedure, For those items not in subtest К", тук, will in general 


be an underestimate of Tix , since items that might belong to К, but not 


*For purposes of th 
moment. In actual applic ООП development we г 


ations of " е 
iscussion of type of EO A A of the method, te 
section. 


ssume all correlations to be product- 
d hories are recommended. А fuller 
coefficient appropriate for the analysis appears in a later 


ROBERT J. WHERRY AND BEN J. WINER 165 


included in К’, will not be properly weighted in defining the vector Xx. 
as an estimate of rxz , since the intercorrelations 
al be underestimates of the correlations between | 
e following sections point out the 
the estimates and develop 
d upon the subtests more 


There is also a bias in Гал 
of the subtests will in gener: 
groups in the common-faetor space. Th 
origin of these biases or spurious elements in 
adjustment procedures whereby statistics base 
closely approximate those based upon the actual centroids. 


The Transformation of Ttem-sublest Correlation Coefficients into 
Projections on Group Centroid Vectors 


Case 1. Items in Subtest 
The product-moment correlation of 
is a part is given by 


an item with a subtest of which it 


с: + > ата 
об VÈ сі +> 2 0:07: i 
ard deviations of the items are equal, we have 
1+ > та 
ne T ae + бк 


the same form as equation (1a). In 
we note that 


(6) 


If we assume that the stand 


It will be noted that equation (7) has 
order to express fix 45 & function of rix 5 


a= ха + D Lewis: (8) 


If we assume о; = 9; = Ti, equation (S) becomes 


ci. = Plne + 82). (9) 
Substituting from (9) into (7), we have 
1 Ap I+ Ут 
le 0 
к’ "y ак' 
where ах, = ок./б+ . From (9) we also note that 
Se = Ok _ пк, = dk — Пк. (11) 
a 
Assuming Sx, equal to Sx , we can substitute in (ја) to obtain 
м, + Era 
~ i (12) 


ra = NE is X (1 = 1) 


166 PSYCHOMETRIKA 


If now we solve equation (10) for Z; r;; and substitute in equation (12), we 
have 


ы Кб» == (i hij) 


тк © і 
" Vil ав um 
or computationally 
= 3 
Tu (rixa 1) + hi (13a) 


Vb ау F У 


If the items іп a subtest are factorially homogeneous, the average corre- 
lation of a given item with all other items in the subtest can be used as 
a first estimate of its communality. Assuming X 12, = 8к./ (пк, — 1) equa- 
tion (13) becomes upon simplification 
| ades =Й os 
так = EE ( ups, 2 (14) 


пк, 2 
Ое == Тк 
Ж = 1 (ак к) 


In order to estimate the value of a single communality, let us assume 


2 = . 
a= ng — 1 


Under this assumption, equation (10) becomes 


_ Lt tue = Dh | 


Тако di (15) 
Solving (15) for ^; , one obtains 
5 om ПЖ, 
hii = fiv = 1 (16) 
Substituting from (16) into (14), we have after simplification 
бой = 1 
тк = TUR (17) 


Equation (17) admittedly involves several assumptions and approxi- 
mations, but it does provide a useful tool for getting first approximations 
to r;x . By means of the relationship 


2 2 
тїк hu, 


equation (13) сап then be used successively as an iterative device to secure 
better estimates. Iteration can be concluded when none of the values change 
materially (say not more than .01) after iteration. 


ROBERT J. WHERRY AND BEN J. WINER 167 


A numerical example will serve to illustrate the technique. Suppose 
that we have a subtest of six items, each with a difficulty of .50 and with 


unknown factor loadings of 


Item 1 2 3 4 
Loading .10 .30 .40 .50 .60 .80 


The item intercorrelations, also unknown, would be 


Item 1 2 3 4 5 6 = 
1 1.00 03 .04 .05 06 08 1.26 
2 03 1.00 .12 .15 18 24 1.72 
3 04 12 1.00 .20 24 32 1.92 
4 05 15 .20 1.00 30 40 2.10 
5 06 18 24 .30 1.00 48 2.26 
6 08 24 32 .40 48 1.00 2.52 
> 1.26 1.72 1.92 2.10 2.26 2.52 11.68 


The following data about the subtest would be observed: 


ng = 6; Мк, = 8.00; ох, = 2.92; апа 


1 2 8 4 5 6 
.56 .61 .66 .74 


we can deduce the 


Item 
тако 87 .50 


From the information contained in these observed data, 


unknown factor loadings. . 
Assuming the items ere dichotomous, we would have 
iene Mp (18) 

where фк, is the mean difficulty. For the numerical example 

ђе = 3.00/6 = 50. 
Then from the equation "nS (19) 
we obtain 

о? = (.5)(.5) = > 

` whence 


= 2.92/.25 = 11.68. 


ow be used to obtain a first 
1, rix. can be used as an esti- 


on (13) is used x 
approximation to гук by either equation 


al. = ок ја 
Either equation (17) or equation (13) can n 
approximation to 7; . If equatt e 
mate of h?, . Let us designate this first арр v 1 й 
(17) bs (13) /^ Өз Moré exact approximations can be obtained by using 
equation (13) iteratively. The results of continuing this iterative process 


168 PSYCHOMETRIKA 


through three steps for the numerical example under consideration are given 
in Table 1. 


TABLE 1 


Illustrative Example of Iterative Procedure 


(пк, = 6; ак, = A/ 11.68 = 3.418) 


актк — 1 hi Ты Mii тке WiK FiK FiK TK 
1. 0.265 .187 .021 .37 .145 .106 „103 .10 
2 0.709 .250 .119 .50 .345 .308 .300 .30 
Б] 0.914 .314 .195 .56 .441 .413 .405 .40 
4 1.085 .372 .275 .61 .524 .506 .500 .50 
5 1.256 .436 .370 -66 .608 .605 .605 . 60 
6 1.529 .548 .558 204.747. .778 .796 .80 

Z 2.057 1.538 1.504 


D? = (ак, — пк.) + 2, 
2 = 56804-XÀ, = 7.797 7.218 7.184 
1/D = .9595 .3722 .3731 


In this example, the iterative process converges quite rapidly; after 
three steps all estimates are within .005 of the theoretical values. It should 
be noted that this iterative procedure is analogous to the usual iterated 
centroid method for stabilizing the estimates of the communalities and 
reduces to it if all assumptions implicit in equation (13) are met. 

The iterative procedure outlined here can be carried out simultaneously 
for several groups and convergence is usually quite rapid. The various con- 
ditions under which convergence is not rapid or in which the convergence 
is toward incorrect. values has been investigated (12). It was found that 
only in extreme cases not usually encountered with ordinary test data will 
the iterative process fail to converge to correct values. The values of the 
communalities computed in the manner described are, of course, approxi- 
mately equal to the square of the projections of the items on the centroid 
of the subtest containing the items. Additional increments to the communality 
of the items are obtained from projections of the items on the centroids of 
subtests which do not contain the items. 

It should be recalled that the method used in this section applies only 
to items included in subtests and assumes that items have approximately 
equal variances within any subtest. 


Case 2. Items Not in The Subtest 


The adjustments necessary to obtain тук from Тук, for items falling under 
Case 2 are simpler than those made for items falling under Case 1. For 


—————— ыала ай 


> 


ROBERT J. WHERRY AND BEN J. WINER 169 


items not included in subtest K’, the value of r;x is given by equation (1b) 
Under the assumption of equal item variances for items in subtest A’ we 
, 


с; Ут: ENT 
1 1 


һауе 


(i not in K’; j in К). (20) 


т.к’ = 
ск, ак, 
If now we substitute in equation (1b) the value of У); гг; given by equation 
(20) and also for the denominator of equation (1b) the expression obtained 
from equation (11), we have 


pam hK) 
iK 5 F 10. № 
Мак — 2,0 = №) 

= Cera, (21) 

where 
Ск. = 
or computationally 
1 
(21a) 


Ск. = = • 
i 1- п d. 
ак’ 


‚ will have been computed in 


adjustment factor, Ск, 
’. Thus all 


tion process for the items in subtest K 


ide of equation (21) are known. 
f equation (21), suppose in the numerical 


problem presented under Case 1, we also have items 7 through 10 which are 
not included in subtest K’. Suppose the unknown theoretical factor loadings 


for these items on the same factor considered above to be 


The denominator of the 
the last step of the itera 
of the values on the right 5 

То illustrate the application о 


Item 7 8 9 d 

Loading .00 .20 .50 ih 

The unknown theoretical intercorrelations would be 

Item T 8 9 10 
1 .00 .02 .05 .07 
2 .00 .06 .15 421 
3 .00 .08 .20 .28 
4 .00 .10 .25 .35 
5 .00 .12 .30 .42 
Р 6 .00 16 .40 56 
1.35 1.89 


> .00 54 


170 PSYCHOMETRIKA ' 


The observed 7; к, values would therefore be 


Item 7 8 9 10 
ike .00 .16 -395 .95 


We would therefore have 
Ск. = 3.418/2.679 = 1.276. 
From equation (21) we could compute the following values of ғ, к : 


Item 7 8 9 10 
TiK .000 .204 .504 .702 


It will be noted that r;x values for items in Case 2 are underestimates 
of the corresponding т, . In general the value of Ск, in equation (21) will 
be greater than unity. On the other hand for items falling in Case 1, к, 
overestimates the relatively low гук and underestimates the relatively high 
тк. Опе would expect the errors of estimation to be in this direction, since 
the contribution of the specifics would be relatively greater for those items 
having small r; values. 


Transforming Correlations between. Subtests 
into Correlations between Factors 


While the procedures developed so far have enabled us to estimate the 
Tix from the rix ‚ we still need to convert correlations between subtests, 
Tx’ , into cosines of the angular separations of the centroid vectors Xx 
and X, or, what is equivalent, into the correlations between subtests in the 
common-factor space, rx, . The correlation between subtests K’ and L/, 
assuming all items within each subtest have approximately equal difficulties 
(but all subtests need not necessarily have the same difficulty), we have 


2x Ут; 


У (ок /:)(с1./5;) 
= I» G in K^; j in L/). (22) 


The correlation between these two subtests in the common-factor space 
would, however, be 
25 2295 


ET Mat = БЭ a= he) Ass n 5 = num (i in K'ijin ДЖ (28) 


Dividing numerator and denominator of equation (23) by агар, , we obtain 


TRL 


њи | Ter | 
sii _ 2: - M) МА BS S 
ак, üp 


= Ск. Ск, 3 (24) 


ROBERT J. WHERRY AND BEN J. WINER 171 


where Cx, and Су, are adjustment factors that will have already been com- 
puted in the process of working with equation (21). А 

In order to obtain an estimate of rx, prior to the adjustment of the 
Tue and т, а slightly less exact variant of equation (24) can be obtained. 
Using the same logic that led to equation (14), we can write 


Treas 


Мо en P 


NK 


fri = 


By using equation (25) as a first approximation to гк, one can employ the 
Doolittle process to be described in the next section in order to check the 
possibility that the subtests are linearly dependent. (A necessary condition 
for groups in the multiple-group centroid method is that they be linearly 
independent.) Using the Doolittle procedure to be described, but without 
the horizontal extension, it will be possible to discover which subtests have 
leading A-row entries less than say .10. Since most of the subtests that 
fall in this latter category can have the largest part of their variance satis- 
factorily accounted for by other subtests, in order to stabilize the inverse 
of the matrix that must be inverted these subtests are eliminated from 
further consideration. After тк, values have been adjusted, one should 
recompute the rgy from the more precise estimate given by equation (24). 
The values given by equation (24) should be used in obtaining the trans- 
formation matrix described in the next section. 


ing the Transformation Matrix for Converting Projections 
‘ons on Orthogonal Axes 


We now have information identical with that obtained in the regular 
The explanation given by Thurstone for obtaining the 
[ matrix manipulation ean be clarified and perhaps 
the procedures are viewed in terms of the part- 


А Method of бесит Маі 
on Oblique Reference Vectors to Projecti 


group factor method. 
transformation in terms 0 
made easier to follow when 
correlation coefficient. } ‚ЭИ T a 
Dunlap and Cureton (1) give the formula for the part-correlation 
coefficient as follows: 
Т zm TiaTaB О, (26 
Tir = Tia) = VA = DH ) 


This formula can be rewritten as 
1 


ES ‘| >=}: 2 
Tur = Tisay — nul Fe) i (я > =) 259) 


d from an earlier section that 


It will be recalle 
= ra (1.000) + 7,5(.000). (27) 


ти = Tia 


172 PSYCHOMETRIKA 


These last two equations can be expressed in matrix form as follows: 


Tia Тув Tit Тит 
Toa Тов r Vor Vorr 
1.000 
Tia lin 000 РВИ 1 Tir Vin 
WI = fs 
Tna Тав Tur Tan 


Or, if we designate the matrix of projections on the oblique axes by P and 
the transformation matrix by T, we have 


РТ = p, (28) 


where F is the matrix of projections on orthogonal axes. It is our purpose 
to show that the transformation matrix T can be built up by a successive 
part-correlation procedure. 

We start. with table of intercorrelation of the factors, thus 


A В 
А 1000 ты 
В Tan 1.000 i 


To this we append, on the right, a diagonal matrix with — 1.000 values in the 
diagonal, thus | 


А B =f м TT 
A 1.000 Tap — 1.000 000 
В Tar 1.000 :000 — 1.000 


Performing the usual Doolittle operations on this last expression yields 
in literal form 


A B =f sH 
A(A) 1.000 Паз —1.000 .000 
== LL M 
A(R) — —1.000 —тав || 1.000 .000 | 
ва) | 1.000 .000 — 1.000 
B(2) = та в Tar -000 
B(A) d = fis Tap —1.000 
BR) — 1.000 mae, excl 
= fas 4 == TAB 


ROBERT J. WHERRY AND BEN J. WINER 173 


In this schematic Doolittle, the notation X(A) indicates a row obtained by 
addition; the^notation X(R) indicates a row obtained by multiplying by a 
negative reciprocal. If one forms the matrix Ё by using the rows in the 
X(R)-row entries of the extended portion of the Doolittle table (in box 


above) as columns in the matrix Æ, one would have 


m es —тлһ/(1 — i (29) 
.000 М — rie) 


Comparison of the matrix Æ with the matrix T will show that they are 
identical except for the denominators in the second column. To complete 
the identity, it will be necessary to set up a diagonal matrix, D, by using 
the square roots of the leading X(A)-row entries in the Doolittle as diagonal 


а= [E .000 | (30) 
.000 М1 = ris 


entries, thus 


We now have the relationship 
ED = Т. (31) 


olving more than two factors can be solved in identical 
n of the Doolittle of as many 
tries of this extended matrix 


Problems inv И 
fashion by the inclusion in the extended portio: 


entries as there are factors. The X(R)-row en d 
will form the columns of the Ё matrix. The square roots of the X(A)-row 
leading entries will forni an expanded diagonal matrix D. Тће expanded 
transformation will be given by equation (31). The proof that the elements 
in the matrix product PT are actually part-correlation coefficients can be 
readily established by writing out the complete literal solution for the general 


case. | 

One word of warning appears necessary. If any X(A)-row entry begins 
to approach zero (say becomes .20 or less), the probabi в => я error 
remains (if reliability is greater than .80, the remainc à xs e M. 
Any variable for which this happens should be dropped throughout the entire 
Doolittle, since all of its real variance is already accounted for by other 
tests already included. 


Use of Tetrachorics in Place of Product-moment Correlations 
in Factoring Items 


nted so far, it has been assumed that the 
s to be factored has as elements product- 
7), which is basic to all of the development, 
ations. However, because of the dicho- 
responses in many types 


In the development preset 
matrix of inter-item correlation 
moment correlations. Equation ( 
holds only for product-moment correl 1005; i 
tomous nature or highly skewed distribution 0 


174 PSYCHOMETRIKA 


of items, it would be more appropriate empirically to take as the starting 
point for the analysis a matrix of inter-item tetrachories. Wherry and Gay- 
lord (10) have summarized the major arguments for the use of tetrachorics 
in factor analysis of items. In order to use tetrachoric correlations as the 
starting point, certain modifications in the procedures developed above are 
necessary. 

Тће adjustments to be described in this section are needed only when 
iterative corrections to the communalities are made. In the practical applica- 
tions of the method this iteration is not actually necessary if the number 
of items in a subtest is ten or more. Before developing the basis for these 
adjustments let us examine the tetrachoric function. It is derived from à 
normal bivariate distribution and therefore assumes that the underlying 
variables are each continuous and normally distributed. Implicit also is the 
assumption that the regression is linear. If the variables satisfy these con- 
ditions, the product-moment correlation and the tetrachoric correlation are 
identical and equation (7) is valid for both types of coefficients, The question 
arises as to whether this equation is an estimator of an empirically determined 
tetrachoric. In other words, if the r;; on the right side of the equation (7) 
are tetrachorics, is the left side of this equation a good estimate of the cor- 
responding tetrachorie obtained by direct computation? 

The theoretical answer to this question would be in the affirmative 
if the implicit assumptions were reasonably fulfilled. Some of these azsump- 
tions appear, however, to be quite drastically violated when one is dealing 
with relationships between test items. Rather extensive checks on equation 
(7) have been made using tetrachorics on the type of data one is likely to 
encounter in the area of testing and scaling problems. These empirical checks 
(12) indicate that observed tetrachorics computed from the Базе data 
are of the same order of magnitude as those obtained by using tetrachorics 
in equation (7). Further, the two sets of data correlate approximately .90. 
Widely varying difficulties as well as diverse item-types were included in 
these investigations. All of the tetrachories were computed from dichotomies 
as near the median as the distributions of the variables would permit; some 
of these dichotomies, however, had to be as extreme as 15-85. Considering 
the fact that a shift in the point of dichotomization will change the correlation 
for individual fourfolds, the estimate given by equation (7 
good. It would seem, therefore, that the violations of the assumptions implicit 
in equation (7) are not great enough to offset, obtaining unbiased estimates 
of tetrachories by entering tetrachorics in equation (7), at least for the data 
that one is likely to obtain when working in this area, 


Let us now turn to a closely related aspect of this problem. In the course 
of the development, we set 


exeo = > DET; + nx. 


) is remarkably 


ROBERT J. WHERRY AND BEN J. WINER 175 


Because the tetrachorie correlations will in general be larger than the cor- 
responding product-moment correlations obtained from items which are 
dichotomously scored or items which have only a few categories, the ratio 
of the variances, which estimates the sum of the product-moment correlations, 
will tend to underestimate the sum of the corresponding tetrachorics. What 
one needs, therefore, as an estimate of the sum of inter-item tetrachorics 
is the ratio of the variance of subtest to item under the assumption of contin- 
uous, bivariate normally distributed items. In order to arrive at this estimate, 
let us first sce if we can obtain an estimate of Түкө from Tizom , Where 
it is assumed that the latter variables have been grouped into broad cate- 
gories but are basically continuous, bivariate normal. 

Let us consider two cases. Under Case 1 we will assume that the items 
have been dichotomously scored. Under Case 2 the assumption will be made 
that the number of scoring categories for each item is three or more. In 
both cases we make the assumption that all pairs of items have bivariate 


normal distributions. 


Case 1 


Under the assumption that the poir 
is at the median, the tetrachoric corre 


nt of dichotomization of both variables 
lation coefficient is given by 


where a, b, c, d refer to cell frequencies in a fourfold. But under the assumption 
, а] 


of median splits on both variables we have 
ad — bc = 
"= a F ђе + да + 96 + 9 


аа — ђе 
VO /2) 


"Thus, we have 

mac ) 32 
Tuer) = SM (2 5) ~ ( * T 
: 3 mation of ree from 7% when the points 
Equation (32) yields a good approxima ian. (Empirical checks indicate that 


of dichotomization are close to the medi г : 
the approximation is good even if the dichotomies are as extreme as 30-70.) 


: ted from equation (32) are given 
Th t .o/rs = ба) 88 compu 1 ; 
їп Table ^ ТА d PCR it will be noted that changes In G, are relatively 


{ e within restricted ranges of 7, , 
smäll for rather large changes !n Ts · Henc а 
without introducing appreciable error, G, can be considered to be constant. 


176 PSYCHOMETRIKA 


TABLE 2 
Values of Tees = Gs 
Te 

Te Gs 

.05 1.56 
.10 1.56 
.15 1.56 
.20 1.54 
.30 1.51 
.40 1.47 
.50 1.41 
- 60 1.35 
.70 1,27 
.80 1.19 
.90 1.09 


To return to the problem of estimating a sum of tetrachorics from а 
sum of phis, under the model used to derive equation (32) we have 


by Тасю = >> Gar, . 
The sum that we are interested in estimating is obtained from the inter- 
correlation of variables that àre considered to belong toa group in the multiple- 
&roup centroid method of factoring. Hence, the range of inter-corrolations 
within this sum is not likely to be large (1.е., not more than 20 correlation 
points). One is reasonably sure, therefore, that а good estimate of the sum 


over inter-item tetrachorics within subtests (assuming the model) could be 
obtained from 


25 Teu) = G; b Te, (33) 


where Gj is the ratio associated with 7, . This latter value can be estimated 
by calculating sample values of the inter-item correlations, 

It should be noted that a check on the appropriateness of the Gz value 
can be obtained in the process of estimating 7, by also computing the cor- 
responding tetrachorics. To test the empirical validity of equation (33) 
when the underlying assumptions in the model are not completely met, 
inter-item tetrachorics as well as phi coefficients were computed for a series 
of eight subtests. "These subtests were made up of attitude-type items in а 
leader-behavior description questionnaire (12). The subtests each had approxi- 
mately 15 items and 7, for the subtests ranged from .20 to .40. The ratios 
of the sums of the tetrachories to the corresponding sum of the phis ranged 
from 1.82 to 1.31. Many of the items in these subtests could be dichotomized 
only outside of the range 30-70; yet the agreement between empirically and 
theoretically determined (7; was above т = -90. It would seem that the 
underlying model is applicable even though s 


ome of the assumptions are 
far from being met. 


ROBERT J. WHERRY AND BEN J. WINER 177 


Tt, should be noted that this adjustment involves substituting 
[Ga (ak. — nx-)] for (ак. — пк.) in those formulas in which the latter ex- 


pression appears. Thus equation (11) becomes 
Sj = бзк'(ак, — пк). (34) 


One must also replace ax: by its adjusted value, VS. + nx. 


Сазе 2 

| For this case we assume that the items have underlying bivariate normal 
distributions, but have been grouped into as many categories as there are 
Scoring categories. Under restrictions to be indicated, an estimate of the 
product-moment correlation that would be obtained if the data were not 


grouped into broad categories is given by 


Vey 
= , 


Teu! 
Т и 


s indicate the continuous variables. If the terms in the 
e independent of that in the numerator, 
ld for high values of т. For values of 
т in the range from .00 to 40 the error introduced by assuming this inde- 
pendence is not appreciable. Assuming that each of the categories of the 
grouped frequencies is represented by the mean of the category, the cor- 
relation between grouped and ungrou ly distributed data can be 


estimated by (see 6, 395) 


where the prime 
denominator are considered to b 
this estimate cannot obviously ho 


ped normal 


of scoring categories, 
Although rzy is not 
der of magni- 


ems have the same number 


arable, те = Tw’ + 
tis probably of the same or 


derestimate thereof. 
estimated. The value of 


of the product-moment 
based upon categorized 


If we assume that all it 
and that the categories are comp 
actually a tetrachoric correlation, i 
tude as a tetrachoric or a slight un 
The theoretical value of r. can be readily 
Tr», can then be computed. The ratios of estimates 
correlations based upon continuous variates to those 

variates are as follows: 
No. of effective 


categories Рату Ту 
Continuous 1.00 
8 1.03 
5 1.12 
| 4 1.19 
3 1.36 
2 1.57 


178 PSYCHOMETRIKA 


It is important to realize that the number of effective categories differs 
from the number of categories that one may set up for scoring purposes; 
unless the category has the proportion of frequency corresponding to the 
category of which it is supposed to be the counterpart in the normal distri- 
bution, the category cannot be considered "effective," It will be noted that 
for two effective categories, within the range .00 to .30, the value of С. is 
quite close to 7,,,,/7,, . Indications are that this latter ratio is a slight under- 


estimate of the counter-part of б» for three or more categories—the larger 


erestimation. 

ги , the ratio, Toe Тому , be 
/here the diserepancy between 
oretieal estimate differ appre- 
aking a final estimate. Having 


arrived at an adjustment factor, G, the adjustments indicated in the last 


paragraph under Case 1 must be made. 


Summary of Operations 

1. Test items are divided into what are 
pendent subtests by experts. Not all items need be assigned to subtests— 
only those items on which experts agree should be included in the subtests. 


The number of Subtests must be equal to, or greater than, the number of 


factors that would be extracted from the matrix of inter-item correlations. 
2. All persons are scored on all subtests. 


3. The intercorrelations between subtest Scores, Teru., are computed. 


Note: Steps 4 and 6 through 10 can be omitted for subtests having 
ten or more items. 

4. The standard deviation of each subtest is computed. The average 
standard deviation of the items in each of the subtests is computed or esti- 
mated, 


Judged to be relatively inde- 


, are 
у колу Бу use of equation (25). 
of subtests, the 
be used, starting 
intercorrelation With all other 
subtests and progressively adding subtests 

overlap with selected tests 


панн „У 20 ariance is error due to а com- 
bination of unreliability and sampling.) 
6. The item-subtest correlations, r 


y ООШ по о» are computed for all items 
against all subtests retained in step 5. The: 
chorics. 


ROBERT J. WHERRY AND BEN J. WINER 179 


to projections i 
ar cer on group centroid axes, т,к , by means of equations (18) 
£ U 3) or its equivalent, and iterative use of equation (13a). j 
. The rj;;. are now converted to better estimates of тк, by means 


of equation (24). 
9. Step 5 is repeated using v: 


of the rxz called for in step 5. 
10. For the subtests selected in step 9, the тук, for items not in these 


SEE are converted to тук by means of equation (21). Note: If item 2 
сеча ta subtest p but does not belong to subtest О’, r:a» is subject to 
conversion. АП items not included in any of the subtests are also subject 
to this conversion. 

1l. From the intercorrelation matrix (matrix of rex) of the selected 
group factors, à transformation matrix is computed by means of the extended 
Doolittle procedure outlined. Multiplying the matrix of item-group centroid 
ot ae by the transformation matrix yields an estimate of orthogonal 
actor loadings, 7,x - 

12. The 7; are rotated to m 
methods. 


alues of гк, computed from step 8 in place 


eaningfulness by any of the regular rotation 


REFERENCES 
ureton, E. E. On the anal 
esolution of a pool of items into relatively 
6, 401. (Abstract.) 


f the forced-choice and 
shed Ph.D. dissertation, 


1. Dunlap, J. W., and C ysis of causation. J. educ. Psychol. 

1930, 21, 657-680. à 

2. Gleser, G., Loevinger, Ј-, and DuBois, P. К 

homogeneous subtests. Amer. Psychologist, 1951, 

3. Gordon, L. V. A comparison of the validities о 

methods in personality measurement. Unpubli 
University, 1950. 

4. Loevinger, J., Gleser, 

Р genaro aap multiple score tests. 

+ Lucas, C. M. An emergent category approach t 

lisged Ph.D. dissertation, Ohio State Universi 


6. Peters, C. C., and VanVoorhis, W. В. Ste 
bases. New York: McGraw-Hill Book Co., 1940. 
of adolescent in 


7. Phelps, H. R. A factor analysis n 
Unpublished Ph.D. dissertation, Ohio State University, 
L. Occupational coun 


8 Pe W. H., and Shartle, C. ^ 
American Book Co., 1940, Appendix V. 245-250. 
9. Wherry, R. J. Factor analysis of Officer Qualification Form Q.C.L. Form QCL 2b. 
Research Foundation, The Ohio State University, Report to Department of the Army, 
10 Feb. 28, 1950. $ B ai t 
- Wherry, В. J., and Gaylord, К. н. The concept of test and item reliability in relation: 
пола pattern. Payehometrika, 1048. 8/040 
11. Wherry, R. J., Per! E end о campbell, J. T. An empirical verification of tho Wherry~ 
1 Gaylord iterative factor analysis procedure. Psychometrika, 1951, 16, 67-74. р 
2: Winer, B. J. Iterative factor analysis: its psychological and mathematical bases. 
Unpublished Ph.D. dissertation, Ohio State University, 1952. 


questionnaire 
Ohio State 


P. H., and Berkeley, M. H. A new method for 
Psychologist, 1951, 6, 303-304. (Abstract.) 

o the study of adolescent needs. Unpub- 
г, 1951. 

tical procedures an 


G. C., DuBois, 


Amer. 


d their mathematical 


formal group activities and attitudes. 


1951. 


seling techniques. New York: 


M anuscript received 9/22/52 


Ви: 
evised manuscript received 11/11 /52 


BOOK REVIEWS 


Уевхох, Рииль. The Structure of Human Abilities. Methuen’s Manuals of Modern Psy- 
chology. London: Methuen & Co. Ltd.; New York: John Wiley & Sons, Inc., pp. 160. 


Regardless of your factor-analytic faith, regardless of your disposition to worship 0, 
tolerate it, ignore it, or deny it, you will be pleased with this clear, non-technical exposition 
of factor analysis. Professor Vernon writes, “I assume only that the reader has had an ele- 
mentary course in psychology and knows what an intelligence test and a correlation 
coefficient are." He succeeds in holding to this assumption and yet is able to cover with 
admirable lucidity the fundamental concepts of factor analysis, the problems and limitations 
that it is currently facing, and the several conflicting theories. 

This is a fine general description. Someone who has been working with factor analysis 
а long time will find the book goes over much familiar ground. In spite of this, such a person 
will find it very worth while, especially in demonstrating how the familiar concepts may be 
verbalized. The book is very appropriate for students as an introduction to the subject of 
is and for workers in other areas of testing and of psychology who want to 


factor ana 


know more about all the argument. 
The dedication reads, “to C. Burt and G. H. Thomson (with whom I almost always 


agree) and to L. L. Thurstone and the late C. E. Spearman (with whom I usually disagree).” 
Vernon indicates at all points his preference for the extraction of g and а few major group 
factors, Nevertheless his discussion of the general concepts of factor analysis is entirely 
suitable for devotees of all methods, and his comparison of the diverse methods, although 
definitely one-sided, is by far the clearest brief explanation of the situation that is available. 

The first three chapters explain the general theory of factor analysis and its limitations, 
its historical development, the differences among the several methods, and the author's 
preferred group-factor method, and its special implications. Considerable attention is paid 
to the hierarchical group-faetor theory of the structure of abilities. The position is taken 
that g heads the hierarchy, that the major group factors verbal:educational (v:ed) and 
Spatial:mechanical (W:m) are on a second level, that the minor group factors аге at n third 
level, and that specific factors branch down from there. Vernon admits that the hierarchy 
is not perfect; for example, scientific ability cuts across the two major group factors. He 
points out that factors at any level can be obtained by means of an appropristg selection 
of tests. The reviewer feels that to emphasize this hierarchy 28 à m i e meis = 
structure of abilities will prove to be misleading, since any test can be e at any lev 8 
i ; м ly ing the other tests 1m the battery. The usefulness 
in the hierarchy simply by proper ]y selecting the 


of factor analysis rests, to a large extent, on the existence of bee еј 
tests reflecting certain unities of function. The most poan | as variously extended and 
Structuring of these clusters or factors is то Ө 


si arm t rhaps а tendency for the cog- 
Я aps а ў 
Sized and variously overlapping behavior ву! h pe А g 
q. 


ndromes wi 
nitive kind to overlap a common area f us -analyti 
The first three hant beautifully set forth the rensons T or bebe зен ed 
method, its uses, and its limitations. These are urge tO that chould be Же н 
date worker in the field and cover all the important © jerations that would be equally well 
i у 3 ose are Cons! 4 
mind while using the method. Most of these are le, are discussions of 


А = Here, for examp: 
ned to by users of all factor-analytic e ar eet to human abilities, identifica- 
aculties vs. factors, factor analysis as ап empirical? 


Е arrow gi ti effects of 
tion of factors, limitations of factor analysis, broad and а nr NUR ў 
Tünge on NM patterns, the effect of age on f 


181 


‘actor patterns, 


182 PSYCHOMETRIKA 


Some two-thirds of the book is devoted to chapters discussing the factorial findings in 
the various areas of testing. These chapters present the results of both British and American 
studies in a discursive rather than tabular manner. On the whole they give а very fair picture 
of the findings, including a very fair picture of the confusions. Тће more technically oriented 
reader may have occasion to find some fault with these chapters for several reasons: 
(1) the results of the various factorial studies are discussed without indication ag to method 
of analysis, (2) the results are discussed by reference to test names with no further indication 
as to what the tests are like, and (3) coverage of studies is only moderate, some areas such 
as personality being ignored completely. 

A 7-page appendix compares the general-plus- 
factor theories. Although devotees of Thurstone 
that appendix, they will find it a remarkably 
Most of the appendix is devoted to seven re: 
factors. These are given below, each cond 
selected populations g is too big to belittle. 
are interested in selecting populations and 
factors in such situations that the gene 
(2) g and the major group factors are more 
changing populations and changing tests. 
factoring. (4) The “primary” factors are so divisible that it is difficult te 
ization is to stop, except by stopping with the s t are useful, presumably 
either practically or conceptually. (5) Since no te: actor and the g or 
other content mus st, why not admit that all tests involve 0, 
instead of artificially removing it by means of rotation? (6) Hierarchy is пој merely a sta- 
tistical artifact; it is best understood in terms of Beneral-plus-group-factor’ theory. (7) The 
multiple-factor theories encourage factor naming and the false belief that tests will predict 
Success on jobs having activities apparently similar to those involved in the factor, while a 
short battery of the major group factors, v:ed and kim, will serve almost all predictive 
purposes. 


group-factor theories with the multiple- 
"s method will not agree with tl 
precise statement of the theoretical differences, 
asons for the superiority of general-plus-group- 
lensed into one sentence, (1) In all but highly 
(Vernon indicates, however, that sometimes we 
that multiple-factor analysis can reveal group 
ral-plus-group-factor meth 

nearly invariant tha 
(3) Group factorin, 


he import of 


view to present the ё 
points. Perhaps the opposition would have most to say on the j 


contents, Although the reviewer 
school of thought widely different from that of t| с, partieularly С 
~ T arly Chapters 
1-3 and the appendix, was found by him to be stimulating and а great aside 
of a muddled situation. , 
Educational T'esting Service 


John Тр, French, 


ADKINS, Dororny C., AND Lyerty, SAMUEL В. Factor Analysi: i " Ч 
Hill: Univ. North Carolina Press, 1952, pp. iv + 122. 4206. AMARE У. MESES 


E abilities was probably the 1 
- Neither in the definiti Куи 


necessarily imposed, by in 


BOOK REVIEWS 183 


те: i i j i 
: emis domain, report а project designed “to clarify the underlying nature of the abilities 
ting performance in types of tests that have been identified previously or suggested as 


measures of reasoning” (p. 4). 


The report is presented in two sections. In Part I is described a factor analysis of 


38 tests, selected on the basis of their factorial content, from the battery included in the 
Army Air Forces Psychology Program, Report No. 5, Printed Classification Tests. The 
correlations analyzed were those reported in the Air Force study. On the basis of the analy- 
sis, 18 tests were chosen for inclusion in the 66-variable study reported in Part II. The other 
tests administered for this second, major analysis were chosen from а variety of sources, 
and some were developed specifically for this study. In addition to the 65 tests finally 
molested, the number of years of formal schooling was included as a variable. Subjects were 
200 enlisted men, selected by performance on an Army classification aptitude battery to be 
representative of the population of enlisted men in the Army. Following the normalization 
of all variables, product-moment correlation coefficients were computed (IBM equipment 
was used) and a centroid factor analysis was performed. 

Where the aim of a factor analysis project is to discover consistent, meaningful con- 
stellations of ability from the interrelations among а group of variables, the editorial selec- 
tion of variables is obviously of crucial importance. It is noteworthy that the authors 
agreed ‘to devote a sizeable portion of the available resources to deciding upon, selecting 
or constructing, and editing the tests to be used” (p. 4). The literature was systematically 


reviewed for test ideas. Individual psychologists and philosophers were invited to submit 
test ideas and hypotheses 2: ing. There is presented evidence of 


s to the nature of reason! 
careful selection of tests which cover а wide range of reasoning tasks. In addition, tests 
Measuring non-reasoning abilitie: 


s were included—at least two such tests for each of nine 
previously identified factors, e£ Verbal Relations, Number, Space factors, Closure factors, 
ete. 


of interpretation which arises when a factor is defined by 
ration but also alike in medium of presentation, tests con- 
hosen from differing media of presentation. One of à 
of this approach appears in the interpretation of one 
f Abstract Similarities.” The factor is defined 
s classification tests, both verbal and figure 


To prevent the ambiguity 
tests alike in type of mental ope 
sidered to be of similar function were € 
number of examples of the perspicacity 
of the reasoning factors, named “Perception о 


by two verbal classification tests, two figure 5 
analogies tests, and a test of analogies of meaningful pictures. It is apparent that the process 


underlying successful performance оп the tests transcends the medium in which they are 
presented, at least within the limitations of a group-administered paper and pencil test 


battery. ; i 
Sixteen centroid factors Were extracted and rotated into oblique simple structure. 
For thirteen of these, interpretations are offered. Four reasoning factors are presented. In 
addition to the factor “Perception of Abstract Similarities, " reasoning factors have been 
named “Hypothesis Verifi cation” (best defined by the series of Raven's Todos Matrices 
tests), “Deduction,” the ability to пеев (best Зена у ^ Premises 
and Identical Forms tests), and “Concept Formation” (best defined by tests demanding 
that the subject assign to a grouP of objects pictured or named the name of the narrowest 
category which subsumes all objects) Also suggested to be allied to reasoning is “Flexibility 
of Perceptual Closure,” Thurstone's second closure factor, one of the nine reference factors 
in the analysis. es eer, 

Thei i i ear in the book, in general, are convincing. They depend 
БЫЛИ ек tations wel tasts exhibiting hiph uet landings, but slo upon the 
nature of tests not exhibiting high factor loadings—it is often o СОБ, importance to 
doret “Why not?” heeds apparently а tacit recognition of the provisional character 
herent limitations of factor analysis, upon interpretations of 

earlier studies are of considerable aid to the reader 


t references to 


draw correct шеге! 


rotated factors. The frequen 


184 PSYCHOMETRIRA 


in establishing similarities between factors here reported and those identified in previous 
investigations of mental abilities. Differences, too, are reported, particularly with respect 
to the Air Force studies. There is discovered no correspondence between the characteristics 
of the several reasoning factors diseussed in the Air Force Report No. 5 and those of the 
reasoning factors isolated here. To the reviewer the interpretations of the present study 
seem to provide a more satisfactory picture of reasoning abilities and the interpretations 
make good sense, psychologically. However, further investigation of the discrepancies 
certainly is warranted. 

In most respects, this book is extremely comprehensive. Each test is described suc- 
сте у in terms of content, time limits, scoring formula, ete., and both raw score and nor- 
malized score frequency distributions are exhibited. Complete tables of test, intercorrela- 
tions and of 16th-factor residuals are presented, in addition to tables of centroid and oblique 
factor loadings, the transformation matrix, and the matrix of cosines of reference vectors. 
Useful information which might have been presented, but is not, includes the distribution 
of number of items completed on each test (from which it would be possible to obtain an 
estimate of the level of chance performance) and graphical representation of pairs of reason- 
ing factors (to supply pictorial guidance for the assessment of interrelations among these 
factors). 

This work provides considerable advance toward the goal of organizing our knowledge 
of reasoning abilities. The study supplies a framework of hypotheses, the confirmation or 
revision of which might be expected to lead directly to stable primary abilities of reasoning. 
In addition to serving as a guide valuable to both theoreticians and practitioners interested 
in the measurement of intellective functions, the study serves as ап example of one of the 
most fruitful applications of factor analysis methods. 


Univ 


sity of Chicago Lyle V. Jones 


Tuoyp A. Jr (Ed.), Cerebral Mechanisms in Behavior, The Hixon Symposium. 
New York: John Wiley, 1951, pp. xiv + 311, 86.50. 


This book contains the papers given during the Hixon Symposium at the California 
Institute of Technology in September, 1948. Following each paper is an edited transeript 
of the discussion. 

The first paper, by John von Neumann, is “The General and Logical Theory of 
Automata.” Dr. von Neumann runs through the simil: and some of the critical dif- 
ferences between artificial and natural automata, between computing machines and the 
central nervous system. He concludes that the inferiority of our materials and the absence 
of any adequate theory prevents us from attaining the high degree of complication and the 
small dimensions of natural automata. The McCulloch-Pitts theory, built on the present. 
system of formal logie, is inadequate. А new logic is needed whose procedures allow a low 
but non-zero probability of errors. Turing’s results are extended to a theory for self- 
reproducing automata. The paper was received by the other participants with skeptical 
remarks like the following: 


MeCulloch, “I envy Dr. von Neumann the fact that the machines with which he has to 
cope are those for which he has, from the beginning, a blueprint of what the machine is 
supposed to do and how it is supposed to do it." 


Gerard, “I have had the privilege of hearing Dr. von Neumann speak on various occasio?s, 
and I always find myself in the delightful but difficult role of hanging on to the tail of a 


kite. 


">> 


BOOK REVIEWS 185 


Weiss, “I question whether a mechanism in which all these innumerable contingencies have 
been foreseen, and the corresponding corrective measures built in, is actually conceivable.” 


Lashley, “It seems to me the question of precision of the organic machine has been somewhat 
exaggerated.” 
Halstead, “I suspect that von Neumann biases his automata towards rationality by careful 
regulation of the energies of the substrate.” 
Lorente de Nó, “Possibly the automaton can be made to maintain memory, but the auto- 
maton that does would not have the properties of our nervous system.” 

Warren S. McCulloch presented the second paper, “Why the Mind is in the Head.” 
Не asserts that the nervous system is par excellence a logical machine. It is a highly redun- 
dant machine because information handling capacity is sacrificed for dependability. The 
notion of negative feedback is considered to be neurophysiologically important. Finally, 
McCulloch reviews in some detail the neural cireuits he has proposed for form perception. 


This paper evoked such remarks as: 


Lorente de Nó, “Dr. McCulloch has brought what we know of both the anatomy and the 
physiology of the brain closer to an integrated whole than it has ever been before.” 


von Neumann, “1 see the plausibility of what you say, but I still have a residue of uncer- 


tainty left.” 
Gerard, “If these networks of neurons are organized so beautifully in the striate, then how 
do you account for some of Dr. Lashley's critical experiments on destruction of different 


parts of the brai 


Köhler, “I admire the courage with which Dr. MeCulloch tries to relate his neurophysiology 
5 i — sia » 

to faeis in psychology. But I sometimes feel like eritieizing the results. 

Lashley, “I am very much in sympathy with the type of development represented in the 

last two papers. At the present time, however, such 2 formulation involves a great over- 

simplification of the problems.” | T 
The third paper, by Lorente de №, had to be omitted. The fourth, “The Problem of 

я aper, b) P 

Serial Order in Behavior," was given by K. 5. Lashley. Lashley argues that the temporal 

organization of behavior has never been properly considered. The notion of chains of 

associated reflexes is not adequate. А variety of examples, most of them linguistic, lead 

Lashley to consider a “priming” mechanism that gets responses ready before they occur. 


Temporal order is probably closely related to spatial order. The other participants com- 


mented: 


4 that a neurological thinker has 


rst time since 191 1 ical 
с factor in behavior. 


le of the tim 
e case that Dr. Lashley has made for non- 


this is the fi 


Klüver, “Та my opinion, 
ssis of the ro 


presented such a trenchant analy 


Halstead, “I have been greatly impressed with thi 


Us 5 эра ЭЎ 
specific, non-mosaic representation. 


to think through or even г 
lling along atomic fibers. 


towards the complexities of behavior 


Gerard, “I find it impossible 


if restricted to atomic units trave 
ras going through my head a mental picture of 


Fe NG. А cas listeni here w 
Lorente de Nó, “While I was listening th perform—suggested to me by Dr. Lashley’s 


a number of experiments that I intend to 


speech.” 
Lashley’s presentation lies in the fact that it places rigorous 
De designing models of the nervous system." 


Weiss, “The great value of s 
ht of our fancy 1n 


limitations upon the free flig! 


186 PSYCHOMETRIKA 


The fifth paper, by Heinrich Klüver, was "Functional Differences between the 
Occipital and Temporal Lobes.” Klüver reviews his work on the occipital and temporal 
lobes. Removing the occipital lobe causes a monkey to behave as though his eye were à 
simple photocell which records only changes in light flux. Removing the temporal lobes 
does not produce much sensory éffect but causes remarkable change in behavior. Klüver 
then calls attention to extracerebral mechanisms that exert obscure influences on the brain 
and illustrates them by his own work on the role of porphyrins in the central nervous system. 
Sample comments were: 


McCulloch, “Each time we get one of these problems in which we are concerned on the one 
side with chemistry, and on the other side with the structure of the nervous system, we get 
into difficulties which take us years and years to solve." 


Gerard, “I am particularly grateful to Dr. Klüver for, in a sense 


‚ putting the brain back in 
the body." 


Köhler, “I have perhaps missed the connection between the two parts of Dr. Klüver's 
paper." 

Wolfgang Kóhler read the sixth paper, “Relational Determination in Perception." 
He begins with a review of his experiments on figural after-effects and argues that they 
should be interpreted in terms of direct, currents flowing through the brain tissue. This 


argument led to experiments searching for such direct currents. Some reactions to this 
paper were: 


Lashley, “Т am at а loss to see where further development of the theory will lead.” 


Lorente de Nó, “From lookin 
perfectly legitimate records a 
in physiology." 


g at your records I don't see any reason why they are not 
nd why we are not now in the presence of a new phenomenon 


Gerard, "Tt is somewhat to the shame of physiologists that the spontaneous rhythm of the 
human brain was discovered by a psychiatrist—the Berger rhythm. Now, again, it is not 
a physiologist, but a psychologist who has had the cour: 


ке, : age to try a reasonable gamble and 
look for his still slower changes directly in the human brain. I am much inclined to think 
that he has found them." 


Liddell, “How do you propose to follow this clue of the slowly fluctuating cortical potentials 
when you change over to the kinesthetic and tactile fields?” 

The seventh paper, “Brain and Intelligence,” was given by Ward C. Halstead. His 
paper follows along many of the ideas of his book Brain and Intell: 
effects of lesions on intelligence, the factors in biological intelligence, 
lobes, etc. His paper evoked such comments as: 


igence and treats the 
the role of the frontal 


Lashley, “I think this is the most promising method of 


approach to the whole problem of 
cerebral localization that has been made." m 


Nielsen, “Dr. Halstead is the only psychologist that I have ever ћи 


i eard of who can tell by 
his psychological tests that the frontal lobes have been taken о.” 


Kliver, “Dr. Halstead’s intensive analysis has thrown new light on the functional signifi- 
cance of the frontal lobes." 


Lindsley, ‘I am sure that the stimulation of Dr. Halstead's work will direct a number of 
psychologists into this kind of application.” 


The volume closes with a review of the symposium from the viewpoint of a clinician, 


| 


BOOK REVIEWS 187 


Henry W. Brosin. He says the great strength of the group is their willingness to tolerate 
partial answers, proposes that color responses on the Rorschach test should be of especial 
interest to neurophysiologists, and wonders if psychology will not find its “great man” in 
the person who can combine the concepts of Freud with the ideals of Wundt. These remarks 
by Dr. Brosin were extemporaneous. 

ТЕ this review gives a somewhat confused pieture of the book, then it correctly sum- 
marizes the reviewer's impression. The papers are uniformly good and will be useful to give 
graduate students an introduction to the thinking of these famous scientists. The discussion 
is heterogeneous, sometimes inaccurate, seldom documented, and usually disorganized. 
The possibility of a general theory of behavior based on cerebral mechanisms looks faintly 
hopeful at first, but deteriorates as the symposium progresses. At least one reader closed 
the book with the impression that the study of behavior has much more to contribute to 
our knowledge of cerebral mechanisms than vice versa. 


Massachusetts Institute of Technology G. A. Miller 


NORMAN FREDERIKSEN AND W. B. SCHRADER, Adjustment to College. Princeton: Educa- 
tional Testing Service, 1951, рр. XVII 4- 504. 


Soon after the veterans began to pour into our colleges and universities at the end of 
World War II, educators started to deliver opinions and research workers analyses of data 
about veterans’ adjustment to college. The large number and complexity of the factors 
involved cast doubts on both the opinions and the analyses of data. Opinions were too 
vulnerable to the effects of sentimental and financial considerations. Virtually all of the 
repor ted studies failed to control one or more of such relevant factors as year in class, pre- 
dicted academic performance (e.g, high-school rank and college aptitude score), division of 
the college in which the student was enrolled, and the specifies associated with one institu- 


tion as compared to others. > 


It is fortunate, then, that this study of a well-planned sample of sixteen colleges and 


universities was made possible through the financial assistance of the Carnegie Corporation 


and the consultative resources of the Educational Testing Service, of which the authors are 
staff members. Here we have a definitive answer based upon a sophisticated analysis of the 


question. 

Not only was 
а questionnaire was admini: 
tudes toward college and co 
bearing on the importance of the 


medium of grades, investigated but 
stered dealing with facts of personal history and status, atti- 


llege grades, worries and anxieties, use of time, and factors 
"GI Bill" in determining college attendance. The ques- 


tionnaire was administered in the fall of 1946 and a sample of approximately 11,000 dis- 


tributed through the sixteen institutions was drawn. ‘hich permitted the use of an index 
Through an application of covariance analysis which Р RECEN eh Gol sucess and 
representing the variance in grades unaccounted for by dh P = Su qui ван 
aptitude and achievement, ability was ruled out as & шо m tit k E th Ms a i 
and non-veterans. These two groups Were com npared ы E. суха ^i a A ie “ 
by sex, class, and division. They find that the ДЕТ ЊЕ F ey NE 
of equal ability is supported. For freshmen, however, : is Nhan MERI га us л ше 
most, extreme instances (groups), the MU P AM d A. 
опу ло more than ADEL пр analyses of the questionnaire responses, but 
for p M ве differences between veterans and non-veterans in motiva- 


М > т › worries, if anything, are fewer than the non- 
tional adjustment are revealed. Veterans ] 


academie achievement, through the 


188 PSYCHOMETRIKA 


veterans', though somewhat differently distributed. The veterans were more concerned 
about financial problems and concentration, while non-veterans were more concerned about 
feelings of inferiority and social adjustment. Of special interest from the point of view of 
national educational policy are findings related to non-aptitude determiners of going to 
college. The veterans were drawn from families of less edueational background and lower 
income than their non-veteran counterparts. At the same time students who are older and 
from lower socio-economic groups tend to be overachievers. Specificity and certainty of 
vocational choice were other outstanding factors in overachievement. 

The meticulous interpretation of data is marred by one instance. The lack of correla- 
tion between date of testing and test scores and grade achievement is taken as evidence 
that “the time of taking the test has little effect on the predictive value of the test” (p. 59). 
This lack of correlation with date of testing docs not preclude the possibility that the correla- 
tion of lest scores taken a year earlier with grades will be lower than the correlation of 
test scores taken at the start of the current year. However, this fault is a minor one in an 
otherwise-well planned, thoroughly analyzed study. 


University of Michigan Edward S. Bordin 


BOOKS RECEIVED 


Азнву, W. Ross. Design for а Brain. New York: John Wiley and Sons, Inc., 1952, pp 
ix + 260. 

Barrow, Евер. Mental Prodigies. New York: Philosophical Library, 1952, pp. 256. 

Berrier, Е. K. Practical Psychology (Revised Edition). New York: The MacMillan Co., 
1952, pp. xv + 640. 

Bureau of Psychology, U.P., Allahabad. An Educational Guidance Project. Allaiiabad: 
Bureau of Psychology, Uttar Pradesh, 1952, ii + 82. 

Снезвев, Eustace. Cruelty to Children. New York: Philosophical Library, 1952, pp. 159. 

Cocuran, Упллам G. Sampling Techniques. New York: John Wiley and Sons, Ine., 1953, 
xiv + 330. 

Соомвз, Сьхов Н. A Theory of Psychological Scaling. Engineering Research Bulletin No. 34. 
Ann Arbor: Univ. Michigan Press, 1952, pp. vi 4- 94. 

Dzwixc, W. E. Some Theory of Sampling. New York: John Wiley and Sons, Inc., 1950, 
pp. xvii + 602. 

EDUCATIONAL Testinc Service. А Summary of Statistics on the Selective Service College 
Qualification Test. Princeton: Educational Testing Service, 1952, pp. 71. 

Gouvpen, Суви, Н. Methods of Statistical Analysis. New York: John Wiley and Sons, Inc., 
1952, pp. vi + 440. ` 

Нухрмах, Oran R., M.D. The Origin of Life and the Evolution of Living T 
Philosophical Library, 1952, pp. xxi + 648. 

Кавре, Fay B. The Psychology and Psychotherapy of Otto Rank. New York: 
Library, 1953, ix + 129. 

Lanois, PauL H., Ахр Этоме, Canorr. The Relationship of Parental Authority Patterns to 
Teenage Adjustments. Bulletin No. 538. Washington Agricultural Experiment Stations. 
Pullman, Wash.: State College of Washington, 1952, pp. 31. 

Mater, Norman В. Е. Principles of Human Relations: Applications to Management. New 
York: John Wiley and Sons, Inc., 1952, pp. ix + 474. 

Parmer, Hanorp, M.D. The Philosophy of Psychiatry. New York: Philosophical Library, 
1952, pp. ix + 70. 

Poporsky, Epwarp (Editor). Encyclopedia of Aberrations. New York: 
Library, 1953, viii + 550. 


"hings. New York: 


Philosophical 


Philosophical 


|, 


BOOKS RECEIVED 189 


Rao, C. RADHAKRISHNA. Advanced Statistical Methods in Biometric Research. New York: 
John Wiley and Sons, Ine., 1952, pp. xvii + 390. . 

Riese, WALTHER. The Conception of Disease: Its History, its Versions and its Nature. 
New York: Philosophical Library, 1953. 120. 

Вовлск, A. A. А History of American Psychology. New York: Library Publishers, 1952, 
pp. xiv + 426. 

Trevert, Т, Н. C. The Мећо 
рр. 365. 

Traxier, ARTHUR E.; JACOBS, 
Introduction to Testing and t 
Harper Brothers, 1953, x + 113. 

Wiper, Влумохр L. Introduction to the Foundations 
Wiley and Sons, Inc., 1952, pp. xiv + 305. 
Морт, DAEL (chairman), et al Improving l "ndergraduate. Instruction in Psychology. 

New York: The Maemillan Co., 1952, pp. vi + 58. 


ds of Statistics. New York: John Wiley and Sons, Inc., 1952, 


ROBERT; SELOVER, MARGARET; AND TOWNSEND, AGATHA. 
he Use of Test Results in Public Schools. New York: 


of Mathematics. New York: John 


D yo % ^ { 
Py » 4 
" 
££ 
ж 
Dr 
u 
[ 
^. 


PSYCHOMETRIKA—VOL. 18, No. 3 
SEPTEMBER, 1953 


APPLICATION OF A LARGE SAMPLING CRITERION TO SOME 
SAMPLING PROBLEMS IN FACTOR ANALYSIS* 


DavnE D. RIPPE 
OPERATIONS ANALYSIS OFFICE, STRATEGIC AIR COMMAND 
A technique is presented to test the completeness of factor solutions 


and also to test the significance of common-component loadings. The chi- 
square test involved is based upon the asymptotic normal properties of the 


residuals. 


1. Introduction. Consider а k-variate universe in the variables y; and 
а random sample of size N from this universe. Denote 


1 Š 
ji Pr Yiu» (1) 
and consider the new variable 
= 0: Йй. 'Q 
Then form the covariances of these variables in the usual way, 
па 1 аан. Да 8 


The various techniques of factor analysis mathematically form a linear 
transformation of the k variables т; into the form 

ж; = Qaf + аР + аР. + a;U; , (4) 
where the F; are called common components and the (7, are called unique 
components. If the new variables are selected such that their sample means 
ате zero and sample variances are unity, and further such. b ge unique 
components are not correlated among themselves DO ws созш 
components, then in terms of these new variables, (3) becomes 


је == >) ана F > | > (а „йв + Фрбгојћаа H бика; , (5) 
" pel nd реј а=р+1 
Where 
а 
Tya = 22 Foafoa › (6) 
Fr 


Р resented are based was conducted under 
*The research work on which the results р epartment, University of Michigan. 


des ties D 
the supervision of Prof. P. S. Dwyer Lo eme] in а Ph. D. thesis, June, 1951. 


he complete results of this researc 
191 


192 à PSYCHOMETRIKA 


and where 6,; is the Kronecker delta. Finally, when the common components 
are chosen in such а way that т, is zero, then (5) becomes 


ту = У) аца + бива; . (7) 
pel 


The same technique may be applied to the standard deviates instead 
of the deviates of (2); that is, to 


In this case sample correlation coefficients rather than the sample covariances 
are involved. The theory developed in the next sections applies to either the 
correlation coefficients or the covariances. Maintaining the covariance 
symbol defined in (3) emphasizes the generality of the method. 

Using a matrix notation: 


m = (mij) (k X k covariance matrix), 


а„„ = (a;j) (Е X s common component | 
loading matrix), (8) 
а, = (а.) (k X k diagonal unique 


component: loading matrix), 
the matrix equation for (7) is 


т = аа, + аа. , (9) 
where the primes indicate matrix transposes. 
The matrix 


т.— аа, = ба а (10) 


is then actually factored by this technique, and thus the idea of matrix 
factorization is introduced into the problem. The number of linearly inde- 
pendent columns, s, in a,, is equal to the rank of т — аа! . In order to 
satisfy (10) within rounding error, it will usually be required that s = ^. 
However, the quantities т;; are subject to sampling variation, and it is 
desirable to carry the component solution only to the point that (10) is 
satisfied within that variation. In other words, if the residual matrix is de- 
noted 
(m — а,а,) E; [NUM , 

the solution should be earried only to the-point that this residual matrix 18 
zero within possible sampling variation. Any components obtained beyond 
this point will be regarded as insignificant. 


A number of different tests have been given for determining the number 


DAYLE D. RIPPE ; 193 


of significant components. In most cases they are designed for particular 
types of factor solutions. Among the writers who have proposed tests are 
Coombs (2), Hoel (5, 6), Holzinger and Harman (7), Hotelling (8), and 
Lawley (9, 11). In this paper a technique comparable to that proposed by 
Lawley (9) is developed with a slightly different interpretation concerning 
the basis for the significance test, in order to provide a wider range of appli- 
cation. The theory goes back to the sampling distribution of the covariances, 
and, in fact, as special cases large sampling tests for the significance of various 


correlation coefficients can be obtained. 


2. Interpretation of the Sampling Problem and Development of the Large 
Sampling Criterion for Completeness of Factorization. In order to arrive at à 
test for the significance of components in matrix factorization, it is first 
necessary to define a significant component. Consider an arbitrary orthogonal 
matrix factorization of the sample covariance matrix which results in (10) 


or equivalently in (7) for all т, j. Define a significant component in the follow- 
ing way: If it is hypothesized that 


А 
из = У аа + буда , (11) 

pel А 
ance matrix сап be regarded as the covariance matrix 
he population whose covariance matrix has as a 
here are only s significant common components 
atrix. This definition of significant com- 


ponent fits quite well into the picture of the general usage of factor analysis. 
That is, one deals with а sample covariance (or correlation) matrix in the 
factor process, and the interpretation of the results is usually in terms of 
the population from which the random sample is assumed to have been 
drawn. Hence, the hypothesis of (11) is inherent in the interpretation of 
the results whether or not the sampling criterion is applied. 


In order to provide the sampling formula required, assume that the 
+ k-variate norma: 


and if the sample covari 
of a random sample from t 
typical element’ ша; , then t 
present in the sample covariance m 


Ж, x VC ERES robabili 
original variables have a још ршн ш sf 
density function is 


Jn, 31253,3409 аз) ы ‚ 94) 
1 (12) 
Deux ma] 
-——M— exp B Xu, 
ст Vl» | 2 
| џ | is the determinant of this matrix, 
to и. Although the assumption 


ears that much of the validity 
directly on this assumption. 


where и is the covariance matrix, i 
and и’ is the ijth element in the matrix inverse 
of normality is rather restrictive in nature, it app 
and significance of factor processes rests rather 


194 PSYCHOMETRIKA 


With the underlying normality, it follows that the т; of (3) have a Wishart 
distribution (14, Sect. 11.1) whose probability density function is 


пүх? ЖЕР n ii 
| (5) А"? | та; K k-1)/2 exp [- 2 b» b» па | 
и ути“) = У 
W, (ms; ; na?) плата T p e — Е 1) 
i=l 


» (13) 


where A = | и“ |. , 
The m;; defined in (3) are unbiased estimates of rm 

also maximum-likelihood estimates of Ms; 

maximum points of the function 


- That they are 
may be verified by obtaining the 


log W,, = log К + 2 log А — 5 У Xm, (14) 
where K is independent of all ш: . By taking the partial derivative of this 
function with respect to д” for all p, q and setting the resulting equations 
equal to zero, we obtain the maximum-likelihood estimates 


Ша = UT. (15) 
Now the (в + 1) quantities (mip — uis) are such that 
(то — us) = 0 => 


— _ plè log т) 
бем 8 иди, J” 
In (17) the expectation is over the entire sample space with the subscript 
и? indicating that the parameters џ,, 
It should be remarked that (16), (17), 
some weak restricti 
the Wishart distribution (which is i |) these restrictions 
do hold. It follows at once that the exponent, 


A= > Фиат» — Hin) (Mj = о (18) 
isa 


has a chi-square distribution with degrees of freedom 
minus the number of independent, linear restriction: 


equal to 3k(k + 4) 
(mi, — ш), or the number of linearly independent 


S among the variables 
variables (m,, 


= 
Жын — 


DAYLE D. RIPPE 195 


у In order to obtain a direct way for calculating A, consider the Taylor 
series expansion with remainder, Rs , of log Wa, about the point ш; = Mi; 
in the ġk(k + 1)-dimensional parameter space. Thus 4 


log W = log ТР], т T 2 C Ы т.) s т x] 
1 Hii Suiimmis 


l _ _ д? log Е) 
+21 У (Hip — maurs — Mie) Е rer 
tgp 
+ В: . (19) 


It can be shown that 


and 
a? log 4 1 
oo щ = 
LE диљди niiemii ка УМ 
Thus for large sample approximations, R, may be dropped and £;,,, may 
be used in the third term on the right of (19). Further the second term on 
the right is zero since all of the first partial derivatives are zero at the maxi- 


mum roint. Hence, using (18) and (14), À becomes 
lg | ти | + 25 umi; — i, 
es of freedom as noted 


was obtained by Lawley (9, Sect. 6) for his 
‘actor analysis. The more general develop- 


ment presented above yields a much wider range of application. The number 


of degrees of freedom associated with (20) when applied to the more general 
problem (regardless of method of factor solution) is greater than ш p 
by Lawley. This reflects the fact that the estimates of st prc nbn ed 
i js are nO 3 
in any general method of component analysis & у 

Тће power of the resu of the test furnished 


i i than that 
Iting test 15 then lower that п 
by Lawley. In the next paragraph the theory 18 developed for fixing the 

degrees of the general problem. í р 
- pel ш pe | = E (where pii is as defined in [11] is commonly 
xpression M;i d uet 


х= "(ов | na = (20) 
where А has a chi-square distribution with degre 
earlier, This same expression 
maximum-likelihood technique of f 


called idual. The В othesized covariance e 
s(0 < > Күзү правац have been analyzed will be denoted 
= $ Ий» + бай; : (21) 

‚ (а) „Ші X asy ie 
(22) 


m; 7 Hii’ 


\ 


(b) P 


196 PSYCHOMETRIKA 


From (7), with the upper limit of summation equal to к, 


Е 
ту = У аьа + б;;а;а; . (28) 
pel 
Then using (23) in (22), we have 
k 
Mi = 27 аша». (24) 
р=з+1 
The matrix of all „г,; may be written: 
Vua быз ++ dy Qisasi Gua ct а 
азаа Яза *** Оль Олажа @»+ +. Ang ga 
(ти) = (242) 
Usi uaa ctt ац, а Ao, t "Фф 
Thus (74) is a k by k — s matrix times its transpose; and hence, since the 
columns of thi i 


of this matrix are linearly independent, it follows that the rank of 


these rows and columns is sym- 


metrie, and hence has $6 — s\(k — s + 1) different elements. Thus, the 


original matrix (.r;;) has exactly $(k — 5)(6 — s + 1) linearly independent 
elements, and hence the number of degrees of freedom for the chi-square 
test of (20) after s common components are removed is 


df. = 3 — Nk — в 41), (25) 

With respect to the computations rei 

it should be noted that for 

considerable. However, the 

using the Dwyer-Guttman technique (3, 4). For cases in which s is con- 
siderably less than /: this technique effect, 


5 à considerable saving in the 
computations required, and does make the application of (20) feasible. 


Example: Assume a sample correlation Matrix of the form: 


1.0000 .5710 7330 -6835 


quired in the application of (20), 
large k, the amount of ] i 


m-| + 1.0000 782 7514 
J « 1.0000 8842 
* " « 1.0000]. 


-—MÀ — 


DAYLE D. RIPPE 197 


(Symmetric elements are indieated with the asterisk.) Assume that this 
was obtained in a random sample of size 100. A complete and arbitrary 


factor solution is 
70 30 10 .05 


.50 .60 40 .02 


аж = 
.80 .50 .20 .06 
70 .60 10 07, 
and 
A075 0 0 0 
н 0 .2296 0 0 
аа, = 


0 o 0664 0 


0 0 0 1381). 
f the factor solution after two and then after three of 


The completeness o! 
o right in a,, were removed was tested 


the components in order from left t 
by (20). The quantities needed are: 
| ту | = .0872. 


(wis): 
à Two Components Three Components 
2.1944 .0122 — 1.3399 — 3318 || 2.1908 .0666 —1.3277 — .3782 
asy 13931 1284 | 4 2% —14791 — „7658 


* 
А . 6.1926 —3.5437 ||» ‚ 61578 — 3.4235 
Н 5 . 568]. « М ‚ 4.867 
| ua |: 0274 0369 
A: 4.36, 3 d.f. 00, 1 d.f. 
or the matrix, and three com- 


Apparently two components completely fact 


ponents certainly do. 
adings. In this section another 


of Component Го 1 
Д А on in the magnitude of com- 


3. Sampling Variation р rae 
sampling problem concerned with the var ion in | 
ponent loadings due to random sampling 15 studied, together with a test 


for the significance of individual component loadings. This section deals 
exclusively with orthogonal solutions, and the next section generalizes the 


А p ны, in the oblique component solutions. 
concept t lude samplin| variability in t ) nr 
ы vs чи мајка a set up ап underlying definition that fits the 


ordinary interpretation of the factor analysis problem and sd Grae) 
matical expression. Assume that a complete factor solution in the sense of 


198 PSYCHOMETRIKA 
Section 2 is available. The covariances yielded by this solution, 
Hi; = by аһа» + баца, , 
pat 


will be regarded as adequately quantizing the unknown population covari- 
ances. Define the variation, x, permissible in any component loading а;„ to 
be such that if a;, + x is substituted for Qi» in the above expression, the 
new covariances (and covariance matrix whose elements are these covariances) 
must be such that they are within a region of random sampling variability 
from и; (and the matrix и). The sample size considered, of course, is N. 
Further the problem is specialized to hold the diagonal entries fixed. This is 


In keeping with the definition Set forth in the preceding paragraph, 
it is desired to determine extreme values for x such that if 


ам Qiz a а, 
а, (x) = | m Ue ба ds (26) 
аа а» а + x Qi, 
а Ane Akp Aks, 
and 
а 0 ++. 0 0 
0 ka и 
а (т) = ЧЕ 0 | (27) 
0 0 = Яды... 0 
0 0 --.- 0 dil, 
then the covariance matrix of 2 random sample of size М may be 
mx) = а, „(л)а, (x) + а (аја, (2) 
pe Hiz “= уе Дуд въ, Шк 
Игл H22 t Mes Fa se. Hor 
Мо ЕРО T аа нЕ (28) 
Hi F dt ni + ax из 3** шады 


Сим Из tt и Рао кз Шк 


DAYLE D. RIPPE 199 


The results of the analysis of Section 2 provides a ready answer to the problem. 
Formula (20) yields a measure of how much population and sample may 
“be apart" at any desired significance level where the number of degrees of 
freedom (which in this case is k — 1, as can be seen from the second form 
of [28]) and the significance level fix №, n = N — 1, | ms; | and (u^) are fixed 
by the factor solution, and | та; | and ть; contain the variable 2. 

From a computational point of view, it is simpler if the component 
loading studied is in the last row of the matrix of common component loadings. 
This can be accomplished by placing the ith variable in the last position 
in the original data and accomplishing the corresponding change in the 
matrix и. However, the general position will be maintained in the following 
'development to emphasize the generality of the method. 

In the calculation of the quantities needed for the application of (20), 


it can be shown directly by writing out the expansion that 
У ит. (о) = k = 28%, (29) 
where 
k. "m 
= $o u” tm. (30) 
a 
Further, it can be shown by straightforward algebraic reduction that 
| mala) | = | was | + | из | Se + Баг, (81) 
where | 
Bua Biz a» Шк 
Hor Hee а n 
gie | зона orte мина BED (32) 
а, Qw 0 Ap 
Ша Bre аљ Hkk 


atrices 2 15 small and the term Di may 
d for more accuracy, 


If it is to be carrie 
y or may be approximated by various 


For factorization of correlation m 

be omitted for an approximation. 

then D? may be evaluated direct, 

techniques to the desired degree of accuracy: E. | 
The expression (20) may be written with the notation introduced in 

this section and with А transposed to the right as 

ЈО) = log | was | — log Пи» | + ? Пра 1 Sz + Биг] 4-285 — A/n, (83) 


where А is fixed at the desired probability level for k — 1 degrees of freedom. 
The roots of f(x) = 0 are then the values of x of permissible variation in 


200 PSYCHOMETRIKA 


the component loading а;, . There will be two such. roots near 2 = 0 and 
opposite in sign. In order to obtain a test of the significance of the com- 
ponent loading a;, , replace х by —а;, in all of the above considerations. 
Example: Consider the example presented in Section 2 and the common 
factor solution consisting of the first three columns of а... In determining 
the variation possible in а, at the 5% significance level again using the 
sample size 100, the quantities needed in the application of (33) are 


| из | = 0388 (| ш; | here has unity for all of the diagonal elements), 
S — —1.2739, 
Di = —.0574. 


Substituting these values into (33), we have 
f(x) = log .0388 — log {.0388 — .0988x — .05742°} — 2.5478x — .0789. 
Roots of f(x) are 

а = .11 


т, = — 15, 


, 


and thus .55 < an < .81 with a 5% significance level. Approximate calcula- 
tions were also made dropping off the term Do. 
These yielded 
a, = .14, 
т. = — 18. 
4. Application to ће бат 
In the application of compo 
the results is often presente 
of these final components wi 


pling Variation in Oblique Component Loadings. 
nent analysis in psychology, the final form of 
d in terms of correlated components. In terms 
е have equation (5) of Section 1 for the popu- 
lation. Since the correlations between the common components are known, 
it is possible to calculate the changes in the reproduced covariances сог- 
responding to certain changes in the component loadings 
to that employed in the previous section. 

In order to use a notation comparable to that of Section 3, denote the 
reproduced covariances for s common components as: 


Mi = »» аа; + ED 
on 


а=р+ 


in а manner similar 


: (аа + аа), + 9,0;0; . (34) 


Now consider the amount x by which а;, can be changed and still have it 
remain within sampling variation of the actual ai, computed. To determine 
x substitute ai, + 2 for a;, in (34) and write the result as 7; . Then, as in 


Section 3, holding the diagonal fixed (mi: = ui) by adjusting the uniqueness, 
the new reproduced covariances are: 


Mis = pi + bs, (j= 1, 2, ... TES 3), (35) 


DAYLE D. RIPPE 201 


where 
Dip = ajrr,r, + Qi rur. 777 GT Fp, - (36) 


Then the adjusted matrix т;;(х) takes on a form similar to that of (28) 


with b;, replacing а. 
Now in much the same way as before define 


T= utbs. (37) 


i=l 

ini 
Then approximate ranges for x are obtained by solving the equation 
0 = dog | ney |= Зов {I ar | +2 во |7 +2723, 89 
N — 1 and А is fixed by the significance level with k — 1 degrees 
one should carry a term comparable to 
. Since z is small the approximation 
the significance of a single 
in (38). In this case the 


where n — 
of freedom. To obtain exact ranges, 


=? of Section З with bj, replacing а; 
of (38) is quite good. Again it is possible to test 
component loading by simply placing 2 = —4ip 
criterion becomes 

у = nflog | из | — 
istribution of 
he last part of the 


log [| us [5222 | us | Ра] — 2Ya;,], (39) 
— 1 degrees of freedom. An appli- 


where А has a chi-square d } 
illustrative example presented 


cation of (39) is made in t 
in the next section. 

In order to apply the results of preceding sec- 
le discussed by Holzinger and Harman is 
and Appendix E). The variables are 
scribed briefly in Appendix B, p. 309. 
duced here as Table 1. 


5. Illustrative Example. 
tions, a thirteen-variable examp. 
considered (7, 30, 189, Appendix B, 
certain psychological tests which are de: 
The table of intercorrelations is given on p. 30, and repro 


TABLE 1 


Intercorrelations of Thirteen Tests? 

wee A 2 à И i в 7 8 9 igo Mar % лв 
1 4.00 ais 408 48 И 5285 301 -332 En ШР S E 80 
2 1.000: „317 1880 1.585 286 ый. сй UD. 250, от 239 
2 Ио gos зат 368 ay 950 180 E078 «091 A40 87 
: PEE 301 35  .099  .110 160 .327 
E C E x Тв 128 31 344 0215 4344 
6 IMMER EU ue зы o .300 
7 б .000  .619 685 240  .232 181  .845 
8 1.000 .532 265 300 .271 895 
: 1.000 .170 .280 „1з  .280 
10 1.000 .484 .585 .408 
15 1.000  .498 .535 
15 1.000 .512 
13 1.000 


*Reproduced from Holzinger and Harman (7, 30). 


202 


PSYCHOMETRIKA 


The uniquenesses were then estimated, the common covariance matrix 


was obtained, and a thr 


€e-component centroid solution was calculated 


(7, 189). The results are repeated in Table 2. 


Cent 


TABLE 2 
roid Solution for Thirteen Tests* 


Component Loadings 


Variable Uniqueness 
С, с, Cs 
1 .442 .607 — .060 —.443 
2 -797 -855 -038 — .266 
3 .638 .418 148 —.429 
4 - 686 .478 083 —.287 
5 „854 .729 .257 .244 
6 -359 ‚707 .354 ‚167 
7 .250 .721 -3867 .257 
8 .429 .705 -197 -062 
9 .242 .698 -409 .252 
10 . 446 .455 —.482 .399 
п .551 .537 —.390 .145 
12 .469 .487 — „553 .033 
13 .401 .674 — .368 — 485 


*Reproduced from Holzinger and Harman (7, 189), 


In order to test the hypothesis that + 


intercorrelations, formula 


his factorization reproduces the 
(20) was applied. The quantities needed are 


lass | = 00412, 


i 

(u?): 

Vari- 

able 1 2 3 4 5 6 7 8 9 10 п 12 13 
1 1.000 —.194 — 350 7.202 —.023 — 086 с _ 350 —.072 — 173. .. 417 
2 1.190 —.121 —.089 — 014 —.042 — .016 — +079 »— (08 6ш — gig 
3 1.883 —.165 —.010 — 076 —.023 — „#20 .089 LONI — 155 
4 1.336 —.041 —.078 — 056 _ "106 —:006 = „023: — 153 
5 2.406 —.399 — 634 _ 7189 —.008 = „058 -.086 
6 2.370 —.632 — —004 —:091 оу — 089 
7 2.099 — 104 —.053 оер —:001 
8 2 +040 —:070.. 05a — 156 
9 CO. 011, Ma ‘055 
10 1.550 — 374 — 464 — 314 
п 1.567 —.331 — 306 
| 1.658 — .459 


Ll. 027, — and 


DAYLE D. RIPPE 203 


Then, placing these results in (20), we have 
^ = 144(log ‚00412 — log .00240 + .027} 


= 81.6, 55 degrees of freedom. 


Using Vix — V2n— 1 and the normal approximation, we have 
h = 233. 


But P(t > 2.33) < .01. Thus it appears that a fourth component really 


should be sought. 

| Formula (39) was then applied to test the significance of two of the 
oblique component loadings of this example. The oblique solution is given 
by Holzinger and Harman (7, 250, 251) and is reproduced in Table 3. 


TABLE 3 
Oblique Solution for Thirteen Tests* 


Component. Loadings 
Adjusted Uniqueness 


ariable 
1 2 3 
1 ‚781 —.089 ‚142 .432 
2 441 .004 .004 .802 
3 .721 — .090 —.142 .619 
4 .508 .090 — .003 . 682 
5 — .058 .801 .087 .343 
6 2087 .809 = .051 .347 
7 — .068 .901 — .030 .279 
8 .155 .591 .078 .460 
9 — .068 .919 —.081 .282 
10 —.885 .164 .809 .401 
11 —.039 .077 .659 .539 
12 .073 = 177 Ку; .456 
13 .351 —.061 .594 ‚392 


Intercorrelation of Components (rriri ): 


F Fs Ез 
Е 1.000 .587 „449 
Fa 1.000 „461 
1.000 


Рз 


*Reproduced from Holzinger and Harman (7, 250, 251). 


The adjusted uniquenesses are such tha 
covariance matrix are all unity- Thes 


t the diagonal entries in the reproduced 
e are used in the application of the 


204 PSYCHOMETRIKA 


"Dwyer-Guttman technique used in caleulating the inverse and determinant 
of our basic covariance matrix. Actual calculation of these quantities yields 


| ш | = .00425 
and (u"’) is: 
Vari- 
able 1 2 3 4 5 6 7 8 9 10 1 12 13 
1 1.697 —.195 —.363 —.266 —. - 03 —.076 
2 1.184 —.122 —.087 —. - .084 — .006 
3 1.372 —.168 —, -. 243 —.037 
4 1.345 —. -. 114 —.008 
5 2 = A78 —.101 
6 -. .008 — .023 
7 -. 101 —.049 
8 1. .041 —.006 : 
9 043 —.011 5 М 
10 -685 —.407 —.507 —.341 
11 1.607 —.334 —.309 
12 .708 — .469 
13 1.972 


The common components lo; 
pattern were tested for th 


the computation following the notation of Section 4. 


] 
012,2 аза 
me ec em му 
bie -406 bu .743 
bas .265 tá .445 
baz -268 ba .604 
ба -387 ba .559 
bis .807 bg .451 
bas .807 ba .489 
br -847 ba 447 
bsa -718 bsi .537 
boz -842 bo .435 
bio,2 311 бол .075 
bia .358 bua .302 
te biz, 316 
bis 419 
[а 80 у —.940 
Х 1.57 x 22.0 


12 degrees of freedom, we 
not significantly different from z is signific ты = 


the 2% level. 


adings а» and аз, Of the above factor 
eir significance. The following is the summary of 


"C 


% 


14. 


. Lawley, D. N. Further investiga! 


. Lawley, D. №. Problems in 
. MeNemar, Q. On the 


DAYLE D. RIPPE 205 


REFERENCES 
Bartlett, M. S. Tests of significance in factor analysi: i Stati 
‚ M. S. ysis. Brit. J. Psychol. 
1880, 5, 7-85, 'sychol., Statist. Sect., 
Coombs, C. H. A criterion for signi 


1941, 6, 267-272. 
Dwyer, P. S. The evaluation of multiple and partial correlation coefficients from 


the factorial matrix. Psychometrika, 1940, 5, 211-232. 
Guttman, L. Multiple rectilinear prediction and the resolution into components. 


Psychometrika, 1940, 5, 75-79. 
Hoel, P. G. A significance test for minimum га 


fieant common factor variance. Psychometrika, 


nk in factor analysis. Psychometrika, 


1939, 4, 149-158. 
Hoel, P. G. A significance test for component analysis. Ann. math. Statist., 1937, 
8, 149-158. 
Holzinger, K. J., and Harman, H. Н. Factor analysis. Chicago: Univ. Chicago Press, 
1936. 
Hotelling, H. Analysis of a complex of statistical variables into principal components. 
J. educ. Psychol., 1933, 24, 417-441, 498-520. 

likelihood. Proc. roy. 


Lawley, D. N. Factor loadings by the method of maximum 


Soc. Edinb., 1940, 60, 64-82. 
c. roy. Soc. Edinb., 


tions in factor estimation. Pro 
1942, 61, 176-183. 

factor analysis. Proc. тоу. Soc. Edinb., 1947, 62, 394-399. 
rs. Psychometrika, 1942, 7, 9-18. 


number of facto 
tical estimation and testing hypothesis. Un- 


theory of statis! 
Columbia University. 
cal statistics. Princeton, 


Wald, A. Notes on the 
published lecture notes, 


Wil«s, S. 8 Mathemati N. J.: Princeton Univ. Press, 1946. 


Manuscript received 11/4/02 


Revised manuscript received 2/20, 


/58 


PSYCHOMETRIKA—VOL. 18, NO. 3 
SEPTEMBER, 1958 


APPROXIMATING MAXIMUM TEST VALIDITY 
BY A NON-PARAMETRIC METHOD 


HAROLD WEBSTER* 
UNIVERSITY OF KENTUCKY 


The Gleser-DuBois conditions for selecting from а number of test, items 
those which will maximize the correlation between total test score and criterion 


will degenerate into ex] ressions omui only item counts on total distribu- 
ibutions. A grouping convention for scores 


tions and the upper halves of distribt 1 
i nded. The inefficiency of the method is easily com- 


near medians is ге f 
f the size of the sample, only standard test- 


pensated for, because, regardless 0 z 
d brief computations are required. A procedure is 


scoring equipment an 
outlined, and some ар discussed. 


1. Introduction. Gleser and DuBois discuss methods for determining 
which items of an experimental test to retain in order to approximate maxi- 
mum test-criterion validity. They describe а new method which has the de- 
sirable feature of allowing for the changes in item-test correlations which 
occur after a first selection of items, so that additional items may then be 
added or dropped in order to achieve still higher validity. The method as 
described by the authors requires item-test and item-criterion point-biserial 
correlations at each cycle, in addition to à product-moment correlation 
between test and criterion. paper is to describe a less 
laborious, though analogous, ch dependable results may 
be obtained for dichotomized items when N is large. А 
2. Derivation of the conditions for item selection. The Gleser-DuBois 
condition to be met by item j in order to be retained in the first sub-test 18 
that its ratio of point-biserials 


ri fric > Те for 


The purpose of this 
procedure with whi 


ти > 0 


ти <0, (1) 
tively. By reference to correlation 


or 
при € Те for 


nd criterion, respec 


{ and c referring to test 2 s 
to be equivalent to 


formulas, { (1) may be seen 
egative, respectively, $ (2) 


G.F oe НИ 
aces zo, for Ba- X.) positive or 
= bo | 
у d zeepsie, New York. 
Now at Vassar College, Poughkeeri не vů approximation method of maximizing 


tGleser, G. C , and DuBois, P. H- 
test validity. Psychometrika, 1951, 16, 129-139. 
у for ЈЕ Fundamental statistics 
ork: Л aw-Hill, 1950. = : ions is that ' 
$The Ca here and in subsequent expressions Mise 
after >, "negative" after <. 


psychology and education (2nd Ed.). New 


'positive" is read 


207 


208 PSYCHOMETRIKA 


where Ху. , Ху, are means on criterion and test for those - individuals giving 
one kind of (correct or preferred) response to item j, and X, , X,, S, , S, are 
the means and standard deviations for criterion and test. 

If both criterion and test distributions are now cut at their medians, so 
that scores which fall above medians are arbitrarily assigned a value 1 and 


those below are assigned a value 0, both means and both stand. 


ard deviations 
become $, and (2) may be rewritten 


к= 2 rc, for (X;, — 5) positive or negative, (3) 
m — 7 


where фло is a fourfold 


point-correlation coefficient, Since both test scores 
and criterion scores have 


been forced into 50-50 dichotomies, 7,, 
bre = ХХХ, = NOG _ dare | " 
МУ: N 


has become 


score which falls above the 
Because test and ¢ 
in (3) may be written а; с 


: esponses on item j given by those above 
the median test Score, divided by th 
Making these substit 


3 t responses. 
] utions, and calling the index for the initial selection of 
items 7, , (3) becomes 

I, = 20: — 4 ~, dare 


Фа —n, < М TL — for(2a;. — п,) positive or negative, (5) 


a The item selected by (5) form a sub-test Т, . The Gleser-Dubois con- 
dition for further selection, either by adding new items or by dropping items 
now in T, , is 


три, £ 8;/28,) > Тез (ri, = 8,/28,) 


positive or negative, the minus Sign to be used only (6) 
for items previously selected by (1). 


sub-test T, . Substituti 
the value 
(6) becomes 


HAROLD WEBSTER 209 


(ЈЕ) Ge 8) 
(Xs X28. (7) 


x (Xie сы Ke, = 4q)S. 


ея is the proportion of incorrect (nonpreferred) responses for item Ј. 
l'or dichotomized test and criterion distributions, the means and standard 
deviations of (7) are changed as before [for (3) and (5)], the only difference 
being that ¢ is now h - hese plus the additional substitution in (7) of 
gr 1 — n;/N result in a condition, analogous to (6), for a second selection of 
items by means of а second item index Ip . Item j is selected to be in the 
second sub-test Та if it was previously selected by (5) and if 


_ — us — е dare _ 
Ts air. — nj) + n/N < м 1, 
—n) = ту, (82) 


for positive or negative 2(ajr. 
or if it was previously rejected by (5) and if 
2a;c — i 4a. 
RAUS EC 20е = 
І. = зат, — Mi НИМ р 


= ni/N. (8b) 


n (82) and (8b) are the same as those in (5), except that a;r, is 

s on item j in the upper half of the Ti distribu- 
instead of T. After rescoring 
be applied again for further 


for positive or negative 2air, 


Values used 1 
the number of correct response 
tion, and аг, с is ПО also obtained by using T; 
items selected to comprise Ts 5 (82) and (8b) may 


selection if this should be песеззату. 
3. Procedure for selecting the items. Items satisfying conditions (5) and 
(8a) or (8b) are most easily selected by the following procedure. 

(1) Obtain the responses for the items from which the final test is to be 
selected on IBM answer sheets. Preferably there should be à large even 


number of subjects. 

(2) Score the answer sheets, recording tw 
score T and the criterion score б. ; 

(3) Separate the sheets into two equal piles, those with C score above 
and those with C below the median C. f 

(4) Using only the N /2 sheets with high C scores, first count and record 
азс , the number of correct ог preferred responses present for each item J, 
and then obtain arc by counting the number of total test scores T which 
fall above median T. 

- (5) Using only the N/2 sheets with low C scores, obtain the number of 

correct responses for each item J- Add these to the ajc obtained in Step (4) 
to obtain n; , the total number of correct responses for each item j. 


о scores on each, the total test 


210 PSYCHOMETRIKA 


(6) From the №/2 sheets with T scores above medi 
number of correct responses present for each item f: 

(7) Obtain the ratio 7, for each item (Tt is usually unnecessary to perform 
the division), apply condition (5), and rescore the sheets with the total Scores 
T, obtained by using only the items selected by (5). 

(8) From the № /2 sheets with Т, scores above median Т, obtain are, 
the number with C scores above median C, and a;r, ; the number of correct 
responses present for each item Fe 

(9) Obtain the ratio 7, for each item, apply condi 
if further changes occur, rescore the sheets with T. 
only the items selected by (За) ог (8b). 

(10) Repeat Steps 8 and 9 for T 

The division of C scores into th. 


an T obtain a;r , 


tion (Ва) or (8b), and, 
2 , the total scores using 


and subsequent sub-tests if necessary. 

ose above and below the median (Step 
4) presents an additional kind of problem if а number of sheets must be 
chosen from a larger number, all of which have the sam 
lapping the median. For very large N 
Suffice, but for ordinary purposes, bias 
by selecting those cases needed from the sheets уу 
T. For example, if 5 out of 12 scores C = 14 wer 


, then the 5 chosen would be those with 7 


я cedure may be used in dividing 
the N sheets into high and low 77 Scores when it is necessary to select some of 
them having identical values: Тће chosen portion would be those sheets 
having C scores nearest median С. With increasing М and increasing score 
ranges for T' and C, the chance of bias 


from a poor assignment decreases, 
4. Some applications. 'The method which has b 


applied to two experimental tests, one (V = 
small or moderate inter-item and item. 


of 3 items, and for the second test it was тег 


jected Item 1, resulting in a 


the small size of the sample, 
*Op. cit., Table 1, p. 138. 


HAROLD WEBSTER 211 


these failures of the non-parametric method to obtain maximum validity 
are not surprising. 
There is no reason to suppose that item-selection methods based upon 
regression of criteria on item composites can lessen the need for cross-valida- 
tion. On the contrary, such methods utilize inter-item covariation and are 
therefore at least as sensitive to sample peculiarities as less precise methods. 
In order to study the cross-validation problem further, the method was 
applied to a 20-item test which had been administered to 500 subjects for 
whom a continuous criterion was also available. The latter consisted of a 
masculinity-femininity scale with a range 0-9. The 500 cards containing the 
data were first divided into random halves, A and B, and the item selection 
was then carried out using only the 250 cases of Sample A. Validity co- 
me items before and after selection, 


efficients for Sample B, based on the sa 
were then also computed and may be compared in Table 1 with those of 


Sample A. As expected, the correlation for the shortened test on the cross- 
validation sample (.471) is less than that for the original sample (495), 
although the difference is not significant. э 


TABLE 1 
Cross-Validation Data for Maximizing Test Validity by Item Selection* 


Before Selection (20 Items) After Selection (10 Items) 


" - 
Sample DEA HE от Ж of Xe бе Te 
i- 
Original • _ 
(А) 250 9.57 2.81 4.47 2.02 .429 4.06 1.08 4.47 2.02 .495 
Cross-Validation 6 A 
(B) 250 9.70 2.83 4.52 1.94 .431 5.13 2.00 4.52 1.04 .471 
tively. 


*Subseripts ¢ and c stand for total test score and criterion score, respec 


In none of the cases discussed above was there any change in the items 
selected due to applications of (82) and (8b), which suggests that these 
conditions are of limited practical value. Condition (6), when applied by 
Gleser and DuBois to their hypothetical 10-item test, resulted in no changes 
beyond those effected by (1). These authors cite à case (p. 187), however, m 

by using (6). 


which the validity of a longer test was increased slightly ; (6) 
In summary, condition (5) appears to provide à practical criterion for 


selecting those items which will maximize total test validity. Perhaps appli- 
cations to tests comprised of more items than those discussed in this paper 


would be helpful in determining the usefulness of (82) and (8b). 


Manuscript received 2/7/53 


Revised manuscript received 3/28/53 


PSYCHOMETRIKA—VOL. 18, No. 8 
SEPTEMBER, 1953 


A NOTE ON THE NEYMAN-JOHNSON TECHNIQUE* 


Ковевт P. ABELSON 
YALE UNIVERSITY 


i tional and 
f testing the significance of the 


with the criterion mus ally ‹ 5 
nis type of problem is the analysis of covariance (9). The 


Neyman-Johnson technique (7) provides another, an 
approach. A computational procedure is suggestec у ; 
ues without an undue increase 1n computational 


labor. In addition, the Neyman-Johnson technique is generalized to the 
case of n predictor variables. TEN has heretofore been limited to a 
dictor variables. 


Comparison of the Analysis of Covariance and the N eyman-Johnson Technique 
Consider two groups of individuals, designated G and H. Suppose 
that measures are available for all individuals in both groups on à criterion 
variabie, у, and on 7 control variables, €i › а, 77. x, . The linear regressions 

of y on the 2 for the separate groups may be written 
je = poo bratr, (1) 


и = bon + binti + bont: E on + brat, . (2) 
uires (1) that three hypotheses be 


= be + 0,021 + 0022 


An analysis of covariance properly req 


tested: 


А. Тће variance 
equal for the groups G and H. Ме 
B. The corresponding regression slopes are identical for the two groups. 


(bis = bin 5 
re equal for the two groups. 


s of the observed y's about the regression surface are 


bag = но ; bra = ба). 


C. The intercepts & 
(boc E Фон) 
ТЕ either of the first two hypotheses is untenable, then any conclusions 
Which might be drawn from a test of Hypothesis C are to some extent vitiated. 
If Hypothesis А is untenable, then theoretically the test of Hypothesis C 
is illegitimate (although, practically speaking, the investigator may wish to 
*This paper was written while the author wasa Psychometric Fellow of the Educational 
Testing Service, Princeton, New Jersey- 


213 


214 PSYCHOMETRIKA 


go ahead anyway). If Hypothesis B is rejected, the investigator may ge 
up the ghost, go ahead anyway, or, as is recommended here, apply the Ney- 
man-Johnson technique. For when the regression slopes are different, it is 
clear from eqs. (1) and (2) that the difference (fc — йн) and, in particular, the 
significance of the difference (с — Ти) is a function of the control variables, x. 
In other words, certain specified segments of the populations giving rise to 
groups G and H may differ significantly on the criterion variable, whereas 
other segments do not. The Segments are specified by locating certain sets of 
values of z; , 2» , +++ z,—certain regions of the "z-space." The "region of 
significance" (8) of the Neyman-Johnson technique is defined as the set of 
points in the z-space where one group is significantly better than the other on 
the criterion variable. 

In contrast with the analysis of covariance procedure 
Hypotheses A, B, and C in that order and never inve 
significance, Johnson (5) suggests, essentially, that Hypothesis C be tested 
first. He gives no test for Hypothesis А, although he has Stated that its 


acceptance is necessary (6). If Hypothesis C is rejected (i.e., if the groups 
are significantly different on 


caleulated. What frequently h. i 
out to be so large that it include: 


‚ Which tests 
stigates the region of 


`. This waste can be 
eyman-Johnson technique 
ere 18 no significant difference between the re- 


А quantities needed to com- 
Snificance except pz and P; 


н (see below) will already 
computation, 

Computation of the Region of Significance for Any Number of Predictors 
Using maximum-likelihood methods, the N 


х 1 eyman-Johnson technique 
sets up the ratio (the notation is the author's) “ 


Вх’. | 
(XUX)7 , 


Ра ~ 
OO 


чыш: 


" 


д 


T um C MEE 


——À —P > 


ROBERT P. ABELSON 215 


where D is а vector each of whose entries represents the difference between 
corresponding regression slopes for the two groups. 
D- ((Вол x boc); (bin E bio); SU (Ван =: b.o)]. (3) 


X is a vector which specifies а get of fixed values of the control variables. 


X = [mmm mb (4) 
where хо is defined as unity. 5 
U is а matrix found from 
(5) 


U = (Pa + Рт), 
for the total population including 
Hypothesis А. makes it legitimate 
s for groups G and H as an 


where ¢ is the error variance of estimate 
both groups G and H. (The acceptance of 
to use a weighted average of the error variance 


estimate of 1.) 
Pg is the following matrix of data obtained from group б: 


Ne > Tia ps Toa т, Dy Tra 

= = z 
D tie Уй. Уулай» TS 32 ата 
= c Е = 


Ра = || È tza E taatia au ЈУ iiaa 
= = = 


0 


> а 


of individuals in group G and a is à subscript denoting 
esponding matrix for group H. 
dered significantly different on 


pi» dv > ralia 2; газа 
а a = 


(ng is the number 
the individual.) Ру is the corr 
The groups may be consi 
for all X-vectors such that 
pct 1—1 
la DE a] > (по + na) ( т 2), (6) 
rding to а Beta distribution with 


the criterion 


where L is distributed ассо 
бик 


value of L corres 
cise statement of the 


ponding to the 
Neyman- 


and 2 degrees of freedom, and Ly is the 
Yy% tail of the curve. The above 18 а coni 


216 PSYCHOMETRIKA 


T($o-2g) 


This is the critical ratio of the difference between the predicted criterion 
scores of the two groups. It is a function of the vector X, and is of course a 
random variable, since it depends on the regression slopes which are subject 
to sampling fluctuation. 
If we denote the vector [boo , big, «+= » bra] by Bg and take into account 
the fact that the b’s are distributed over an (r + 1)-variate normal surface 
i it is apparent that ВоХ' is distributed 
"eB. Вн)Х' is distributed 


normally with variance X(tP;! + FIX == XU. Since by definitions 


(1), (2), (3), and (4), Ve ms fu = рх’, then 


да md Qu = | DX' = (7) 
90-8) (XUx')? 
Under the null hypothesis that the true value of до is equal to the 
true value of ди , the rati i i 


"n in the followi way: 
L is distributed as the Beta, distribution ро way 
1 
p(L) dL = (Жс т) 1 — pe dL, (9 
$E УЕ 


where М = ng + Пн = the 


total number of Observations. 
Bet up the new variable 


17 
Рес, (10) 


а ЦД ННН 


4 
- & 


—— = 


ROBERT P. ABELSON 217 


'Then 
N 
D = EDEN (11) 
F 

1-2= тт’ (12) 

dL N 
dr ES + Ny? and (18) 
1 1 а —1/2 А 
p(F) dF = а Salita FU? dF. (14) 

af 2 5) 


Compare this with Snedecor's F distribution with 1 and N — 4 degrees 


of freedom: 
(3-N)/2 
) F^ ар. (15) 


В owe 


Эа с 
Ns м-1 


1 
куак = у 
p(F) dF Œ 4 1 


i < ximation to (15). But 
For N der of 200, (14) is an excellent approximation 
an F Gra with 1 and N — 4 degrees of freedom is equivalent to the 
distribution of the square of “Student’s” t with a m 4 degrees of freedom. 
For large №, ¢ in turn is very nearly unit normally distributed, and thus: 
z о 


N =) is approximately distributed as the square of a unit 
L 


normal deviate. 
valence of (6) and (8). 


This establishes the approximate equi i 
Now the ped ME the region of significance 18 found from (8), which 
can be rewritten: 
(16) 


XDIDE > у. 
XUX : 
ality by XUX' (which is always positive 


Multiplying both sides of the inequ 


since U is positive definite), we have 
XD'DX' > ХХ", (17) 
SCD = e2U)X' > 0. (18) 
` Setting 
(19) 


A = р'р — 80, 


218 PSYCHOMETRIKA 


we find that the sets of values for which the groups are significantly different 
on the criterion are given by 


XAX' > 0. (20) 


Where XAX' < 0, there is no significant difference between the groups. 


The boundary of the region of significance, XAX' = 0, isa quadratic surface 


in r dimensions, where r is the number of predictor variables. A is computed 
from (3), (5), and (19), where 6*5, (say) = 3.8416. It will usually be quite 


& task to plot the actual boundary, especially if r > 2, but analytic-geometrical 
methods are available (8) for this process. 


REFERENCES 
1. Gulliksen, H., and Wilks, S. S. Regression tests for several samples. Psychometrika, 
1950, 15, 91-114. 


2. Hansen, Carl W. Factors assoc 
in sixth grade arithmetic. J. educ. Res., 1944, 38, 111-118. 

3. Johnson, Donovan A. An experimental study of the effectiveness of films and film 
strips in teaching geometry. J. exp. Educ., 1949, 17, 363-372. 

4. Johnson, Harry C. The effect of instruction in mathematical vocabulary upon problem 
solving in arithmetic, Jour. educ. Res., 1944, 38, 97-110. м 

5. Johnson, P. O., and Fay, L. The Neyman-Johnson technique, its theory and appli- 
cations. Psychometrika, 1950, 15, 349-367. 

6. Johnson, P. O., and Hoyt, C. On determining three- 
У. exp, Educ., 1947, 15, 342-353. 

7. Johnson, P. O., and Neyman, J. Tes 
cations to some educational problem 

8. Osgood, W. F., and Graustein, W. C. 
The Маст ап Company, 1920. 

9. Snedecor, George W. Statistical 
Press, 1940. 


10. Wilks, S. S. Mathematical statistics. Princeton, N. J.: P 


iated with successful achievement in problem solving 


dimensional regions of significance. 


ts of certain linear hypotheses and their appli- 
ns. Statistical research memoirs, 1936, 1, 57-93. 
+ Plane and solid' analytic geometry. New York: 


methods (3rd Ed.). Ames, Iowa: Iowa State Coll. 
rinceton Univ. Press, 1943. 


Manuscript received 1/18/52 


Revised manuscript received 1 /24/53 


> a га = о 


L— 
A. == 


PSYCHOMBTRIKA—VOL. 18, NO. 8 
SEPTEMBER, 1953 


THE STABILITY OF THE FACTORIAL PATTERN OF AIRCREW 
CLASSIFICATION TESTS IN FOUR ANALYSES* 


VIRGINIA ZACHERT 
AND ^ 


GABRIEL FRIEDMAN 


LACKLAND AIR FORCE BASE, TEXAS 


sists of four factor analyses of the Army Air Forces 
Aircrew Classification Batteries. The first was ап analysis of the 1945 wartime 
battery, while the other three were analyses 0 the 1947 postwar battery, 
consisting of essentially the same varia les, but using different samples. 
Eleven factors were found which had been identified and reported in previous 
analyses. Ап additional factor, possibly an artifact, was identified as an 
age-education doublet. у chich differed significantly in the 
analyses was pilot or flying interest. These factor analyses show that the 

both wartime and 


factorial content of the tests remains quite similar in 
postwar populations. 


This study con: 


E Introduction 


During World War П, trained psychologists dev 1 
the Army Air Forces Aircrew Classification Battery. Ап account of this work 
ean be found in the Army Air Forces Aviation Psychology Program Research 
Reports (3, 4). ^ report of the validity of the battery under peacetime 
conditions and the results of postwar research on the improvement and re- 
vision of the battery are found in (1). The purpose of this paper is to compare 
the factorial content of the battery on postwar samples with wartime samples 
and to draw inferences concerning the stability of the factorial pattern. 

of the Data 


ere made. The first was on the 23-variable 
wartime sample consisted of 


June 1945 Aircrew Classification Battery. This т ] 
tested at Keesler Field during the summer 
947 Aircrew Classification 


8574 unclassified aviation trainees 
of 1945. The other three were on the February 1' 3 
tered to 1511 basic pilot tramees between Feb- 
divided into two groups: 


Battery, which was adminis 
ruary, 194 i ‚ The trainees were d 
y 8, and April, 1949. 18 nce and 511 with previous 


1000 trainees with no previous flying ехрепе 
he United States Air 
"The d. i i dy were collected. as part of t 1 | 
Force Зе: а inthis Sand Developmen ; Program and described in Research 
Bulletin 52-16. The opinions or conclusions contained in this report are ое наў lesum ie 
hey are not to be construed as reflecting the view or indorsement of the epartmen 


the Air Force. 


eloped and administered 


Analysis 


Four separate analyses W' 


219 


220 PSYCHOMETRIKA 


is given in (6), and descriptions of the test variables are given in (3) ‚ (4), 
and (6). 


Factors were extracted from these matrices by Thurstone’s centroid 
method (5). In all cases, 
to be in the matrix. The 


factors were numerous, and the most liberal were usually used. The centroid 
loadings were rotated to 


Interpretation of Results 


The significant loadings (.30 or greater) of the variables on each factor 
are given below. The >> азд. is given by factors to show the average variance 
contributed by each factor for all the yariables‘in the battery, 


a comparison of the variance 
у 
The factors are the same as 


and permits 
extracted on each factor in the different analyses. 
those identified and deseribed by Trench (2) and 
Guilford (3). The factor loadings for 12 interpreted factors are presented 
in Table 2, 

The identification of the Socio-Economic Background factor differs 
Somewhat from its interpretation in wartime research, in which it was termed 
Mathematical Background (3). Attention is also called to the fact that in 
the case of the factor identified as Pilot or Flying Interest, the 25 a/k is 
larger for the total postwar study, since it included the previous flying ex- 
perience variable. The factor termed Age-Education Doublet is presumably 
an artifact. The minimum age in the samples being 20 years, it ig hypothe- 


sized that the men with more education were in general older: hence the 
positive relation between age and education, 


is also shown. 
2. The only factor which had any signi i i 
he oni Y significant difference in the analy ses 
was one identified as Pilot or Flying Interest, This factor is identified by 
the strong relationship with the variable of previous flying experience. 


VIRGINIA ZACHERT AND GABRIEL FRIEDMAN 221 


TABLE 1 
Comparison of Communalities of the Four Analyses* 


Two-Hand Pursuit, СМ810А 


Keesler Postwar Sample 
Variable Sample 
Total NFE} PFE} 

Age — 23 19 20 
Education — 36 38 31 
Arithmetic Reasoning, С12060 57 56 62 57 
Biographical Data (Bomb. O.), CE602D 25 40 41 38 
Biographical Data (Pilot), CE602D 57 48 49 40 
Coordinate Reading, CP224B 55 54 56 58 
Dial and Table Reading, CP621-622A 67 65 65 78 
General Information, CES05F 55 78 56 64 
Instrument Comprehension, C1616C 45 48 44 43 
Mechanical Information, C1905B 60 60 59 69 
Mechanical Principles, C1903B 65 64 64 65 
Numerical Operations I (Front), CI702B 68 65 66 63 
Numerical Operations II (Back), CI702B 78 67 67 70 
Practical Judgment, CI301C 4 34 34 29 40 
Reading Comprehension, CI614H 59 49 n 22 
Spntial Orientation I, CP501B 54 53 5 5 
Spatial Orientation ТТ, CP503B 44 40 em H 
Speed of Tilentificntion, Соло E 5 т» a 
Complex Coordination, CMTE 55 22 88 54 
Diserimination Reaction Time, cpeilD 44 25 
Finger Dexterity, CM110^ ° 29 32 32 n 
Rotary Pursuit, CP410B 43 39 41 
Rudder Control, CM120C d va га 30 
Two-Hand Coordination, CM101B — 50 E n" 
Previous Flying Experience = 66 

| Pedestal Sight Manipulation, СМ724А n = а (> 

f 


*Decimal points omitted. 
INFE—No Flying Experience. 
1PFE— Previous Flying Experience. 


222 


PSYCHOMETRIKA 


TABLE 2 
Rotated Factor Loadings for Twelve Factors В 


asedfon F. our Samples 


Keesler Postwar Samples 
Factor Sample 
Total PRET 
Mechanical Experience on 

Tests: Mechanical Information, CI905B .72 -68 7 “ 
General Information, CE505F AT .53 .60 49 
Mechanical Principles, CI903B „52 AS .48 .19 
Biographical Data (Pilot), CE602D .45 .51 .50 .40 

Уа?/к .07 .06 .07 .06 
Psychomotor Coordination 

Tests: Complex Coordination, CM701E .49 ES .52 .52 
Rotary Pursuit, CP410B .52 .33 .42 .35 
Rudder Control, CM120C .55 44 .42 EN 
Two-Hand Coordination, CM101B — .41 51 44 
Two-Hand Pursuit, СМ801А .40 — = iua 
Pedestal Sight Manipulation, CM724A E === = n 

Xa?/k 06 .08 .05 04 
Perceptual Speed " 

Tests: Spatial Orientation I, CP501B 65 .64 . 64 66 
Speed of Identification, CP610A .58 .63 „61 „56 
Spatial Orientation II, CP503B 49 .48 .48 .42 
Coordinate Reading, СР224В .38 .38 .42 .39 
Dial and Table Reading, CP621-6224 .28 .31 .35 .32 

Зав .07 .06 .06 .06 
Socio-Econorhic Background * 

Tests: Biographical Data (Bomb. 0.), CE602D 38 .60 .60 .56 
Education "— .44 .46 .45 
Biographieal Data. (Pilot), CE602D .53 .28 .36 :81 

Za?/k .02 03 оз .0З 
Numerical Facility 

Tes Numerical Operations I (Front), CI702B 76 .76 ‚74 ‚69 
Numerical Operations 11 (Back), CI702B 78 .78 .76 .78 
Dial and Table Reading, CP621-6224 .50 47 .48 .43 
Arithmetic Reasoning, CI206C .38 .38 .38 .35 
Coordinate Reading, СР224В 52 9 35 .33 

Ea? fk .08 08 .07 .07 
General Reasoning 

Tests: Arithmetic Reasoning, С12060 .38 .49 49 42 

Mechanical Principles, CI903B .38 34 “40 .30 
"o 03 08 о 03 
Psychomotor Precision or Finger Dexterity 

Tests: Finger Dexterity, CM116A .88 Bt AT 42 

Rotary Pursuit, CP410B .32 .42 .38 44 
Хаз 02 о оз 08 
*No Flying Experience, 


Previous Flying Experience, 


VIRGINIA ZACHERT AND GABRIEL FRIEDMAN 


TABLE 2 (Cont.) 
Rotated Factor Loadings for Twelve Factors Based on Four Samples 


223 


Keesler Postwar Samples 
Factor Sample 
Total NFE PFE 
Pilot or Flying Interest 
Tests: General Information, CE505F .49 .57 199 .54 
Rudder Control, CM120C .15 .66 .30 .36 
Previous Flying Experience == “1 — I 
Zak .02 .06 .02 .03 
Spatial Relations 
Tests: Dial and Table Reading, CP621-622A 42 .39 .36 .49 
Coordinate Reading, CP224B .40 .87 Е .38 
Complex Coordination, CM701E 37 ES ‚39 .38 
Two-Hand Coordination, CM101B — .36 .36 .35 
Instrument Comprehension, CI616C .33 .32 .34 487 
Discrimination Reaction Time, CP611D .97 .32 .36 .36 
Ха?/Е .04 .04 .04 .05 
Verbal Comprehension é 
Tests: Reading Comprehension, CI614H .58 .56 .67 .56 
Practical Judgment, CI301C .42 541 .40 44 
Arithmetic Reasoning, CI206C E .33 40 44 
о ZXa?/k 05 .03 .05 .04 
Visualization 
Tests: Mechanical Principles, CI903B 38 .46 37 AT 
Instrument Comprehension, CI616C .24 .32 .27 .08 
Za*/k .02 .03 .02 .02 
Age-Education Doublet 
Ако .36 81 .38 
Education 21 .28 -20 
Da?/k 01 .01 .01 


224 PSYCHOMETRIKA 


REFERENCES 


1. Dailey, J. T., and Gragg, D. B. Postwar research on the classification of aircrew. 
San Antonio, Tex.: Human Resources Research Center, Lackland Air Force Base, 
November 1949. (Research Bulletin 49-2.) 

2. French, J. W. The description of aptitude and achievement tests in terms of rotated 
factors. Psychometric Monograph No. 5. Chicago: Univ. Chieago Press, 1951. 

3. Guilford, J. P. (Ed.) Printed classification tests. Army Air Forces Aviation Psychology 


Program Research Reports, Report No. 5. Washington: U. S. Government Printing 
Office, 1947. 


4. Melton, A. W. (Ed.) Apparatus tests. Arm: 
Research Reports, Report No. 4, Washin 
5. Thurstone, L. L. Multiple-factor anal 
6. Zachert, V., and Friedman, 
batteries with and without + 
Tex.: Human Resources Res 
search Bulletin 52-16.) 
7. Zimmerman, W. S. A sim 
metrika, 1946, 11, 51-56. 


y Air Forces Aviation Psychology Program 
gton: U. S. Government Printing office, 1947. 
lysis. Chicago: Univ. Chicago Press, 1947. 

G. Factorial comparisons of two aircrew classification 
he variable of previous flying experience. San Antonio, 
earch Center, Lackland Air Force Base, April 1952. (Re- 


ple graphical method for orthogonal rotation of axes. Psycho- 


Manuscript received 1 0/11/52 
Revised manuscript received 2/14/58 


— 


PSYCHOMETRIKA—VOL. 18, NO. 3 
SEPTEMBER, 1953 


RELIABILITY FORMULAS THAT DO NOT ASSUME: 
EXPERIMENTAL INDEPENDENCE 


Louis GUTTMAN 


THE ISRAEL INSTITUTE OF APPLIED SOCIAL RESEARCH 


Tautologies are established for the reliability coefficient. р? of the sum 
of n part scores. It i that the part scores are experimentally 
independent of ea t the parts are equivalent to each other. 
The tautologies show the exact role played by experimental dependence 
and nonequivalence i the reliability of the sum. Тће 
formal algebra is ар t 
of the same test, as well as in the sense of а universe of par: 
the empirical meanin 
require information f. А n T 
the form only of lower bounds to P; four of which are dev 


). These can take 
eloped. 


1. Introduction 

ded—or noncompleted—tests by 
(1), and others, emphasizes anew 
f the various reliability coefficients 


problem of spee 


The posing of the 
and Warrington 


Gulliksen (2), Cronbach 


the need for scrutinizing the foundations 0 у о 
popularly in use. The Spearman-Brown coefficients, the Kuder-Richardson 


formulas, the Guttman slower-bounds, and all related coefficients which 
refer to the reliability of a sum and which are based on but a single trial, 
have an assumption in common. They all either implicitly or explicitly 

assume that the scores being summed are experimentally independent. 
For noncompleted tests, it is apparent that such an assumption of 
each respondent is scored zero on each item 


independence is incorrect. If г с 
neral, а serial experimental 


that he does not attempt, then there 55, in genera, >. res 
test items—especially in time-limit tests. 


dependence among the later d 
Even in power tests, where all items are а by everyone, ех- 
пика И ” v hold between items, а8 wi en answering a given 
al dependence may the preceding items. Many other 


item correctly depends on how one answers the С : 
here the assumption of independence of experi- 


examples can be suggested w. 
mental errors is not appropriate. з 

The purpose of the present pape is to present some general reliability 
à П about experimental independence. 


formulas that make no assumptions ata d 
the reliability of the sum of any item scores, 


whether independence holds or not. 
У E d here are mathematical identities 


rmulas develope : 
luse. Their importance 
k from which different 


225 


D 
226 PSYCHOMETRIKA 
4 


practical formulas are easily derived. For a formula to be practical, it has to 
make some assumption as to the nature of the experimental dependence 
among items. Our tautologies show the exact mathematical nature, location, 
and role of such assumptions. | 
The tautologies can be modified to yield praetieal formulas, especially 
lower bounds to p; . One example of an applied formula which ensues is Tor 
speeded or noncompleted tests, based on an exact statement of the serial 
dependence involved. Another practical example is that of the limiting case 
of experimental independence; the resulting formulas coincide with certain 
previously known ones, as should be expected. The specification of 
other type of experimental dependence leads just as immedi 
appropriately modified and practical formul 
Both the theoretical and practical ty 
paper avoid also any assum 


any 
ately to the 
as from our general ones. 

pes of formulas of the present 
ptions concerning the mutual equivalence of 
the items within the test. Instead, they show the exact role of the different 


aspects of nonequivalence, and how no knowledge of the nature of the non- 
equivalence is needed in practice. 


2. The Respective Frameworks of Relests and of Parallel Tests 

We are concerned with the reliability 
= 2. Often a test will be composed of 
simply the scores on the respective items, 
composed of m > n items 
one item; for ex 
n= 2, 


of the sum of n part scores, where 
^ n items, and the part scores are 
In other cases, the test inay be 
, and each part score may be based on more than 
ample, the test may be split into two parts and thus have 
; although each part may include dozens of ite 


ms. 
We assume nothing about “equivalence” of parts of the test in any 
Sense whatsoever. For ex 


ample, a test of 100 dichotomous items may be 
split to h 


ave n = 2, with one part score based on but one item and the other 
part score based on 99 items.* 


The notion of reliability with which we are concerned relates to a uni- 
verse—usually hypothetical—of repeated experiments with the same test 
on the same population of respondents. 


Each part score has its own reli 
To study the reliability 
based on but a single ite 
(a) actual repeated expe 
the next; or (b) correl 
of variables—experi part score. The latter 
technique yields a lower bound to the retest reliability coefficient of the 
part score [cf. (3), especially pp. 277-279]. 


ability, which is separately defined. 
of part scores separate] 


m—can be done in pra 
riments, provided there 


*It may be further commented that some items п 
as far as our general formulas below ire concerned: the re 
is but a special case subsume: 


пау be included in both parts, 
r 
1 under the general formulas, 


ting experimental dependence 


LOUIS GUTTMAN 227 


| It is in the test as a whole, or the sum of n = 2 part scores, that our 
immediate interest lies, and not in the part scores themselves. To study the 
reliability of a sum, the two techniques of the preceding paragraph are again 
available, but now also а third technique is possible, based on a single trial 
of the test alone. Since at least two part scores are available, internal cor- 
relations can be studied on but a single trial, and lower bounds can be estab- 
lished in practice to the theoretical retest coefficient from an infinite number 
of repeated experiments of the test. The internal correlations give some 
evidence as to the reliability of each of the part scores separately, and our 
analysis capitalizes on this information to learn about the reliability of the 


sum, 
ve implied thus far here has 


been called the retest coefficient, referring to repeated experiments on the 
same test. Our algebra and formulas can be seen to hold—with modification 


only of interpretation—if a universe of parallel tests is thought of, instead 
of a universe of experiments with the same test. Ав long as the universe 
remains hypothetical—whether of experiments or of parallel tests—only the 
same numerical lower bounds can be derived in practice. 

The difference between retest theory and parallelism theory occurs 
when two or more experiments are actually made on the same test, and two 
or more parallel tests are actually constructed and used [ef. (4)]. Since the 


presentopaper is aimed at practical formulas based on only à single trial, its 
results hold equally well for parallelism theory where only a single test is 
used in practicê. The difference is in interpretation and not in the formal 
algebra of the present paper. For convenience, we shall use the terminology 
of retest theory here, rather than of parallelism theory, although the ter- 
minologies are interchangeable within the limits of the present algebra. | 
It is important to distinguish parallelism between tests from parallelism 
within tests. Parallelism within a test refers to equivalence of the part scores 
of a test to each other. This is assumed most fully, for example, in the Spear- 
man-Brown “prophecy” formula, and to а considerable extent in the Kuder- 


Richardson formulas. As already emphasized above, equivalence of part 


scores is in no way assumed in the present paper Only parallelism between 
s of relevance to our formulas, and not within tests. 


ctice with only à single trial or test, leaving 
problem of actually studying paral- 
present paper are concerned. 


The kind of reliability coefficient we ha 


tests or experiments i 
And since we shall deal in pra 
the rest of the universe hypothetical, no 
lelism exists as far as the practical formulas of the 
3. Definitions and Notation 
one subseript for the respondent, 
1 (or parallel test). Let жи» be the 
on the kth trial. 
definitely large, 80 


t notation, 


for the tria 
art score 
be considered in 


We need a triple-subscrip 
one for the part score, and one h 
score of the ith respondent on the jth p 
The population of respondents will 


228 PSYCHOMETRIKA 


that 2 has an unlimited range. Actually, our basic formulas hold for a finite 
population, but observability of the needed parameters from but a single 
(3); otherwise, sampling error due to 


The total number of part Scores for the test is some finite 
where n > 2, so the range fory isg —1,2,.-. ,n. 


st for person $ on trial k will be denoted by 


number т, 
tiz , and by definition: 


th = » Yi. (1) 


The mean or “expected” 


value for person 4 оп the jth part score over 
all trials will be denoted by 


ij, OF 


Т, = Е “~. (8) 
k 


The Х.; and T; can be thought of as 
Scores on the parts and tota] test, r 
both members of (1), and using (2) а 


what are conventionally called “true” 
espectively. Taking expectations over 
nd (3), we derive the identity 


T; = d, . (4) 


етепсе between Observed and expected values, 
Tag. — Ха . The variance of these errors for the ith Person will be denoted 
by ei, , or 
б = В (zu, — Хх.) (5) 
Correspondingly, the variance of the unreliabilit ; 
с У errors of the 
total test is defined to be 3 of person $ on 
би = Е (ty — T) (6) 
k 


. The reliability Coefficient, itself depends not Only on the variations 
within people ag defined by (6), but also Оп the variations among people. 


LOUIS GUTTMAN 229 


m study the latter, we need expectations also over the subscript 2. Let £; 
enote the general mean of the jth part score over all persons as well E 


trials, 
Le ЕХи = EE Tiik y (7) 


and let 7 denote the corresponding general mean for total test scores, 


к= ET = ЕЕ. ` (8) 


Using (1), (8), and (7), we obtain the identity: 
т= УБ. (9) 


i=l 

The general variance, over persons and trials, for the jth part score will 
be denoted by 2; , and 

о; = ЕЕ (zi — 5)". (10) 


Similarly, the general variance for the total test scores will be denoted by 

c; and 

с? = EE (t — т)”. (11) 
ik 

on how large the 


ent of the total scores depends 
. The mean error 


THe reliability coeffici 1 
error variances, (6) are on the average compared to а: 


variance within people will be denoted by €, where 


é =E ou. 
i 


a2) 


The reliability coefficient itself is defined as 
(13) 


We shall see that p? is bounded between zero and unity. If ê = 0, then 
there is no variation at all in the &, from trial to trial, and p= 1, or the 
test is perfectly reliable for the given population. The maximum that ¢ can 
attain is o? , as will be shown, in phigh case pite O, оше test is said to have 
no reliability for the population; the variation within people equals the total 

diate ratios of within- 


variation. Intermediate values of pi indicate interme 
persons variance to the general variance. | 

It should be noted that no empirical assumptions of any kind have 
been made in arriving at definition (13).* Equations (1) through (13) are 
ECC rr деин, s over Р aded in range. 
conver е sense oen ЈЕ total scores are limited in magnitude to а finite 


Such convergence always is assure 
range, as they invariably are for psy¢ 


that the expected value: 


230 PSYCHOMETRIKA 


all either definitions or derived mathematical i 
tions on the test, people, or trials. 


The practical problem is to obtain empirical information about e 
especially from but a single trial. It has been shown elsewhere (3) how о; 
can be computed exactly from a single trial. It has also been shown how 
bounds to € and р? can be set from but a single trial, using the special assump- 
tion of experimental independence among part scores. We now wish to study 
€ and р? , but without necessarily assuming experimental independence of 
the parts. 


dentities, and lay no restric- 


4. Further Parameters 
The covariance, over ex 


periments, of the errors of unreliability between 
the jth and the gth part 


score for individual + will be denoted by 


Тани = E (жы — Хоби — X,). (14) 
n 


If the two part scores ате experimentally independent for the ith person, 
then the covariance Уи; Vanishes (3, 263-266). If the part scores are de- 
pendent for person $ then y,,,2;, in general differs from zero. Different laws 
of dependence yield different patterns of values for the n(n — 1)/2 different 
covariances defined by (14). Practical use of our formulas below will depend 
on specifying the particular pattern for the particular data. 


Ultimately, we shall not need the law of (14) for each person, but only 
the quantity we shall call 6, where | 


8 = Ev. (15) 


ize. 
ô is defined by first aver 
Summing over all covariances—omitting self-cov. 
In the ease of experimental independence it mi 


and sign of 6 depend on the nature of the е 
the part scores, 


As usual, the covariance of a v; 
(14) yields о?,, when 9 = ј 


'aging each covariance over all people, and then by 
ariances or variances proper. 
ust be that à = 0. The size 


xperimental dependence of all 


ariable with itself is its variance, or 


Уи; = oF 


i (16) 
We also need parameters of the reliable ог "true" parts of the obser- 
vations. 


The variance of the 7, over people will be denoted by 


2 2 7 
or = BD, = 7, a7) 
i 
+The only assumption used for this purpose i 
only 1 s Е ose 1s that де ere s песе 
that there 15 no copying of one person hom meet pou Шве 0, е dd 
dependence between people (3, 266). 


т any other form of experimental 


-Formula (20) reduces to (10) for g =] 


LOUIS GUTTMAN 231 


The corresponding variances of the X; are ox; , where 

ox; = E (Xu = £y. (18) 
The covariance between X,, and X; over people is 
Үз = E (Xie к, EX AY 5). (19 


Again, (19) reduces to (18) for g = j. 
Finally, we need the general covariance, 


each pair of part scores: 
T T EE Cu = (а — 5). ( 


over people and trials, between 


20) 


valent, then the о; would be mutually 
ance over items of the gii 


(2) 
= АЕ (21) 


Then equivalence would imply а? = 0. We shall not assume this. Indeed 
in practice, it is generally true that а? > 0. It turns out that practical for- 
mulas for p? , imthe form of lower bounds, are obtainable with no knowledge 


or assumptions at all about а. 
It is useful to define two further parameters, T, and Гь, where 


T 3 ухо => È тхо = > ox; (22) 


If all the part scores were equi 
equal for all j. Let о? denote the vari 


gi gui 281 
and 
n я LJ (23) 
2 i 
Г. = L у = xa; — сх! = 
T gel j=l Ten 


9 differ TT ed 
Thus, T, is twice the sum of the n(n — 1)/2 different cov ariances defin 
their squares. 


by (19); and T, is twice the sum of 
If the part scores were all mutually equivalent, 


covariances defined by (19) would be equal. Let 8? denote the v 
the covariances. Then it is easily seen from (23) and (22) that 


then all n(n — 1)/2 
ariance of 


Wee [ pu js (24) 

8 = (п = 1) п(п — 1) 
ui " i ‚ в? = 0. We shall not assume this, 
Equivalence of part scores would imply 8 ible often in practice to 


for in practice в > Oin general. Unlike а, it is possi 


232 PSYCHOMETRIKA 


estimate В. For example, in the case of experimental independence, T, and 
Г, can be computed exactly from but a single trial, leading to the lower 
bound № (3). However, the computations may often be cumbersome. We 
shall arrive at a useful formula which requires no knowledge of 8°. 

A final parameter that we shall need will be denoted by g^. If we let 
Px,x; denote the correlation coefficient over people of the X;, and Х,,, that is: 


__ = xax, 95) 
Рхох; ох, вх, " ( 
then we define c^ to be the double sum: 
= Ж УУ @= 5%, „ (26) 
i=l] g-1 


Since the parentheses on the ri 


€^ is always non-negative. If all parts of the test were equivalent, then all 


ould be unity. In such а case, we would have 


5. Some Fundamental 1 dentities; the First Lower Bound 
One of the most important identities of reliability theory is the following: 
o=’ toh, (27) 
where the three terms 
tively. This states that 
the variance of “true” 


Equality (27) follows i 
with no assumptions wi 


(12), and (17), respec- 
st scores is the sum of 


itions (11), (12), and (17), 
e identity: 

tr= r= (ta — T) + (T; – т). (28) 
Squaring both sides and takin, 


*This is in contrast 
hypotheses of zero correl 
formula like (27). 


to some conventional for, i ; ial 
- i mulations that n e specia. 
ations between “trae” scor need to introduce sp: 


© and errors in order to arrive at à 


LOUIS GUTTMAN 233 


E (ta — т. 9) = (T: — 9 E (ta Т9, (29) 
and the expectation on the right is obviously zero. ` 
Using (27), we can rewrite (13) also as 


27907. (30) 


This shows that р? > 0. Also, from (27), we see that ст < с? . Hence, from 
(30), we have p; = 1. Therefore, we have now proved the previous assertion 
that р’, as defined by (13), varies between zero and unity. 
A further identity, proved in a manner similar to that for (27), is as 
follows: 
Vee — E vrai тс: (31) 


(14), and (19), as сап be seen by expanding 


This follows from definitions (20), 
s in the identity 


the brackets and taking expectation 


(ж = E) = £j) 
x Є – Хо + (Xa — 291. (32) 


= [Кок = Хо) ar (Xi, 
mbers of (20) over g, use (1) and (9), sum again over j 
deme and then use (11), to establish the identity: 


and again use (1) and (9), 
s (88) 


2 
ы в: = L Yeori * 
1 


Д 2-1 im 
mming over (19) it follows that 


ст = Уй Е . (34) 


gal im 


Similarly, from sw 


Still another identity of the same sort follows from summing over both 


members of (14) and using (6): 
9: = >; Ути = 


gel 1=1 
and using (12) establishes the next 


(38) 


Taking expectations over i in (39) 


identity, 
d e 53 Y Елии . ' G9 


€ 4 y 
peL 3 


From (15) and (16), we сап rewrite (36) аз 
2 = УЕ еи T. 


cest n 


(37) 


Е 


234 PSYCHOMETRIKA 


If à is zero in (37), as in the case of experimental independence of the parts 
of the test, then (37) would reduce to a statement. that the total error variance 
is precisely the sum of the part error variances. But if 6 + 0, this nonzero 
value must also be taken into account in (37). 

To transform (37) into a more practical form, we notice that (31) be- 
comes, for g = j, 


oz, = Edo. (38) 


Summing both members of (38) over j yields 


Dio = УЕ, + У. (39) 


i=l i 7-1 


Then, using (39) in (37), we obtain finally the very important identity: 


n 


ё = F Su Уаз, 48, (40) 


j= 


Identity (40) gives us immediately a practical upper bound to ê: 


D E (41) 
i-1 


Hach 02, is observable from but a single trial (3, 281 
assumption C, is used), and 6 is Specified by the law of dependence of the 
given data. For the case of experimental independence, with 6 = 0, (41) 


leads to the lower bound №, to p? as stated in (3). Тће more general lower 
bound, from (41) and (13), will be denoted by At , and 


‚ rememberiig only 


№ 


ll 


(42) 
Clearly, 

M S pf S L (43) 

6. Further Identities and the Se 

The loss of information which makes Л* а lower bound is due to the 

fact that 2l ox, is not observable on a single tria], At least two—experi- 

mentally independent—trials are needed to establish the variation in “true” 

scores. However, we can Improve on \* eyen from а single trial, b. studying 
the sum of the oł, more carefully, o 


аз we shall now do, 
We shall establish the following identity: У 


Xk = үг H Ps + nat + e). (44) 


LOUIS GUTTMAN 235 
One way of doing this is to rewrite (26) by expanding its right member, 
remembering (25): | 


Ф = ( 2 EN авы - (45) 


j=l је1 g-l 


From (23), and then (21), 
2: Уу, = + no? + A 25 x. 


j-1 971 i 


(46) 


Using (46) in (45), collecting and transposing terms, multiplying through 


by n/(n — 1), and then taking square roots, yields (44). 

In the right of (44), а? and e^ are not observable from a single trial. 
But Г, is possibly observable. The observability of Г, depends—according 
to (23) and (31)—оп the observability of the 7,2; and on Ёш. Now 
the у, are observable from а single trial, as shown in (3). The Буги 
are to have specified an appropriate law of experimental dependence for 
g = j. Hence, given the specification, the yx,x; can be observable, and Г 
(also T;) ean be observable. We do not need information on а? and e, then, 
but can use the practical inequality that follows from (44): 


a n 
a — Is. AT 
: 5; Ox; 2 п = Ји ( ) 
From (40) and (47), then 
, 
2 > а 2 -— | ШЖ. ta (48) 
fs La, + seg 
Therefore, if we define A to be 
Eat- Ti, 
_ а ы n—1 ` 49 
№ = 1 A , а 
it must be that A% is а lower bound to p; according to (48) and (13): 
(50) 


№5051. 


This АФ is а generalization of the lower bound № given elsewhere for the 
case of experimental independence (3), and reduces to the latter when the 
Y-,,:,—8nd hence also ó— vanish. . j А 
Te is of imb to see how А? and à% can be written in à different form, 
using the notation of T, . Sum both members of (81) over all g and j, and 


remember (33), (15), (16), and (22): 


“= (; + ss.) * (5 k xa) 


pal + 


(51) 


236 PSYCHOMETRIKA 


Transposing, and using (39), we have 


Уве, (52) 
Using (52) in (42) shows that 
№ = 1, (53) 
с; 


and using (52) in (49) shows that 


2 е (54) 
с; 
7. The Third Lower Bound; Application to Noncompleted Tests 


Another identity that can be Written for У)» сї, is as follows: 


n 2 
È oi = МС) + mé +). 69 
i=l 
This follows by transforming (24) into 
= 2 ri ' (56 
Г, = n(n — 1)g tas cg (86) 


and then substituting (56) into (44). i 

Identity (55) enables us to weaken \% 
computing Г, . According to (55), we have the 
В, о?, and g— 


in order to save the labor of 
practical inequality—omitting 


= Б 
е, xl. (57) 


Notice that it is the absolute value of Г, that appears on the right. T, can 
be negative as well as positive. But certainly, if (57) is true, then we can 
also remove the absolute value sign, 


T, = с? = 25 oi, — 8. (59) 


LOUIS GUTTMAN 237 


Therefore, if we define our third lower bound to be 


n ( 2; с, + ) 
M = 1- == 
n—1 в a (60) 


then, from (59), (58), (40), and (13), we have 
№5051. 


А comparison of (60) with (42) shows that А3 differs from А only by 


the factor n/(n — 1), 


(61) 


o 
M ERES №. (62) 
Also, from (53) and (54), we see that ЕМ > 0, then 

(63) 


МЕМ5М. 
Тће best of the three bounds is № , but the most convenient for general 
use will undoubtedly be № - When 6 = 0, M coincides with the A; of our 
previous paper (3). 

As an application of ХМ , let us consider the case of speeded or non- 
completed tests. А. complete analysis will be given in a later paper (5), and 
we state here only one of the results. If the дз» are scored only in the range 
between zero and unity (say each part score is based on one item), if un- 
attempted items are all scored zero, and if the only experimental dependence 
among the items is a pure serial relation in the attempts—omitting one 


item implying omitting all the following ones—then it can be shown that 


пао (мич, У) 


g-1 


(64) 


on who attempted item 9, and £; is again as defined 
modified lower bound, say M ; somewhat 
). This provides à solution 


king into account 


where 7, is the proporti 
in (7). Using (64) in (60) yields a 
smaller than àA% because of the inequality in (64 
to the problem of studying the reliability of speeded tests, tà 
the experimental dependence that occurs. 


8. Summary of the Formulas 
; In this paper we have developed two kinds of formulas: identities and 
inequalities. Neither kind has involved any hypotheses or assumptions 
ab out the data, so that both apply universally. The identities are not usable 
in practice when data are available from only a single experiment, but the 


inequalities are. 


238 PSYCHOMETRIKA 


It is useful to assemble the identities into direct formulas for р? . One 
such formula is 


n 


PIC pie 4 A (dem q-u 


g-1--5 + (65) 


с; 


'This follows from (13), (40), and (44). Using (59), we can also write (65) as 


: Ty + ALMA sd na? + 2) | 


в: = 5 (66) 


с, 
In (65) and (66), only o? and 
trial. They depend on the equiva 
(65) and (66) show the exact r 
of knowledge of the actual valu 
top. 
Another identity for p? 


€^ are necessarily unobservable in a single 
lence of the part scores of the test. Identities 
ole of nonequivalence for reliability; absence 
es still leaves possible universal lower bounds 


can be written as follows: 


г 4 T, ) m» n 2 *) 
А ivl nc) ug ват + 
р: = = s: 


(67) 


с; 


"This follows from using (56) in (66; 


). It may be helpful to Write again here 
the working formula for n: ' 


FS gy = арш ox . (59) 
j-1 


While the quantity 8° in the 
on Г, and T, , the cumberso: 
inequalities, at the expense 
not appear explicitly in (67) 


right of (67) may be observable, being based 
me calculations may be omitted in the practical 
of some weakening of the inequalities, Г, does 


Omitting the (nonnegative) radical in (65), (66), or (67) leads to lower 
bound АФ : 
2795 Чы 
Merge (42) 


Omitting only the (nonnegative) terms involving o? anq «? in (66) leads to 
the second lower bound: 


3 ide (49) 


LOUIS GUTTMAN 239 


itti 5 ( 1 i i 
Omitting the nonnegative) terms in В, а, and e m 67) leads to the third 
lower bound: i i | 
n j=l d 


Ж = — = 
м1 | a 


(60) 


An important special case is when n = 2, or the test is divided into 


two parts. This leads to “split-half” formulas. When n = 2 then 8" is iden- 
tically zero; there is only one “true” covariance, ух,х, , 50 there is no variation 
among “true” covariances. For this special case, (66) and (67) become 


й = г. + ME: +2 Q=, E 
where now 
Г, = 9 – vf — бу. 00, (n = 2), (69) 
and 
ô= 2 E Yanzi» ‚ (22. (70) 


> 0, we always have № = А, according 
rtance of this joint special case of 
tely, as M , where 


In this special case of n = 2, if T 
to (49) and (60). Because of the impo 
M and Ху, ме shall label its lower bound separa 


od о, + 5) @ = 2), (71) 
or 


At = 1 - 


d 


ô being defined as in (70). M generalizes the 
elsewhere (3), to the case of experimental dependence. 
In the lower bounds defined in (42), (49), (60), апа (71), respectively, 


a and each o2, are observable from but a single trial, as proved elsewhere 
in (3). à and Г, depend on the law of experimental dependence among the 
part scores; different laws will give these different working formulas. The 


present general formulas show precisely how and where experimental de- 
tter what the nature of the data or the 


pendence enters the problem, no ma 
law of dependence. N 


“split-half” bound A, , defined 


REFERENCES 
ton, W. G. Time-limit tests: estimating their reli 
1951, 16, 167-187. 
ew York: Wiley, 1950. 
liability. Psychometrika, 1945, 


1. Cronbach, Lee J., and Warring ability 
and degree of speeding. Psychometrika, 

2. Qulliksen, Harold. Theory of mental tests. № 

3. Guttman, Louis. А basis for analyzing test-retest re 
10, 255-282. 

4. Guttman, Louis. A special revi 
1953, 18, 123-130. 

5. Guttman, Louis. The relia 


ew of Gulliksen’s Theory of mental tests. Psychometrika, 


bility of speeded or noncompleted tests. (In preparation). 


Manuscript received 1/6/53 


PSYCHOMETRIKA—VOL. 18, No. 3 
SEPTEMBER, 1953 


NOTE ON MILLER'S “FINITE MARKOV PROCESSES IN 
PSYCHOLOGY" 


Влонавр С. У. Kao 
UNIVERSITY OF MICHIGAN 


In his article Finite Markov Processes in Psychology,"* G. A. Miller 
derived a least-squares “estimate” for a matrix of transitional probabilities. 


However, the mathematical proof is found to be invalid. 


On page 158, Miller defined N by the equation 
N-N-«C, (19) 


"where the elements of the matrix C are the corrections that must be added 
to the observed values in N to give the best estimate N." He wished to de- 
termine 7, the “best” estimate of the transformation. From the definitions 
ot T, M, N, N, and C, he argued that the following equation holds: 


ТМ =N=N+C. 
It is clear from this equation that T would be “best”? in a trivial sense if C 
is assumed to be the zero matrix, ie, № = М. We shall show that Miller had 
in fact derived only this trivial estimate by means of his undefined math- 
ematical techniquef. 
From equation (20, 


(20) 


) Miller obtained another expression for C: 


Q-2-N-4TM. Q1) 
For a least-squares solution, he argued that CC' must be & minimum. But 
this minimum cannot be obtained by simply ‘+++ putting the partial de- 
rivative with respect to T' to zero:”” 

2 co! = MC’ = 0, (22) 
oT 

ing a function of a matrix with respect to the 


for the operation of differentiati 
11 in this connection. It is obvious from equa- 


matrix has not been defined at а 


*Psychometrika, 1952, 17, 149-167. ‘ pr: | 
For a valid mathematical proof of а least-squares estimate in this connection, see 
Goodman’s “А Further Note on ‘Finite Markov Processes in Psychology.' " This issue, 
245-248. 

241 


242 PSYCHOMETRIKA 


tion (21) that С” is as much a function of T аз C. Hence, in differentiating the 
expression 


CC’ = (-N + TM)C' = -NC + TMC', 


one cannot assert that 9/07 (— МС") = 0. For then one would be asserting 
aT'/aT = 0, a result which is inconsistent with the undefined operation 
ӘТ/ӘТ = 1, in the case of symmetric matrices where one clearly has T” = 
тж Hence, the first equality in equation (22) cannot be meaningful. 

The second equality in equation (22) in effect requires C to be a zero 
matrix. For in order that MC' = 0 for Whatever M, C’ or C must be zero. 
Granted that zero-divisors (i.e., AB = 0 for А = 0 and В = 0) are possible 


, and consequently, one can choose а matrix M with all positive 


— 0. In view of the fact that 
equation (22) is asserted to hold in general, we conclude that it does only if 


C is the zero matrix, from which the tautological nature of Miller's argument 
becomes clear, 


уве ег one chooses to use matrix or scalar notation for 
mmunication with Professor Miller shows that he does the latter. 
ations in (n — 1) + 2 unknowns in C and 7, 
ned from setting the partial derivatives of 


9 2 E де 
E сє; = 2 с: = 0 |j = 1,2); 
He > 75 G-L2; 
whence MC' — 0 on substituting {9с:/91,} by 
one partieular element in a matrix in his “ 
whole matrix has thereby been minimized. 
assumptions lead to a matrix C of the form 


M. It is unclear why Professor Miller chooses 
matrix differentiation" and concludes that the 
In the case of two alternatives, various special 


n-1 n-1 
Э = Хе 
а i=l 
n-1 п-1 


ment with respect {0 {;,7 = 1, 2, is one 
lement with Tespect to the same thing? 


RICHARD С. W. КАО 243 


Apart from all these considerations, Miller went on to substitute his 
equation (21) into equation (22) and obtained the expression 
M(—N + TM) = -MN' + ММ" = 0. 
he got what he called the “best” estimate of Т: 


By rearranging terms, 
(23) 


Т = NM'(MM)'. 
In case M, N are non-singular matrices—Miller's assumption that they be of 
event a from being (n — 1) —or 


order a X (n — 1) does not, of course, pr 
M, N have inverses, we may show the tautological nature of Miller's argument 
by a different method.* We proceed to simplify equation (23) as follows: 


T- NM'(MM")' = МММ) МГ = NIM^ = NM” 


or 
ТМ = М, 
quation (20) Miller had assumed C — 0 or N-N. 
In case M, N are singular, this second proof would not apply; but the first still 
would. On the other hand, neither could Miller capitalize on the irrelevant 
fact that M, N are singular to prove the validity of his results. We have here 
something which is essentially à mathematical identity, the validity of which 
is independent of the choice of М and N Я Hence, in order to show that equa- 
tion (23) does not hold in general except in the trivial sense, it is sufficient to 
produce one cotinter-example where M, N are non-singular. For the logical 
denial of a proposition which reads, “for all х, Р(х) is true” is that “there 
exists опе z such that Р(х) is false." W 1 
As a casual remark we note that setting the partial derivatives with 
respect to a variable to zero is only a necessary and not a sufficient condition 
for obtaining a minimum. For the latter, the second-order condition cannot be 
ignored. Granted that the experimental interpretation of CC’ is such that a 
e, on the other 


* e : 1 e 
maximum is unlikely or even impossible, there 18 ООО s 
hand, that the stationary value obtained from using only the first-order 


; d inimizing quad- 
condition is extremal at all. The mathematical pr oblem of minimizing 9 
ratic forms in general is not аз simple as one may реше "o to 
Finally, it is to be noted that the contention in this note месе”. 
Miller's mathematical proof" of his best estimate for the таӊ 0 finite 
tional probabilities and not to his “psychological interpretation ine 


Markov processes. 


which shows again that in e 


Manuscript received 11/19/52 
Revised manuscript received 4/4/53 
where M, N are 


argument does not apply to Goodman’s results, 


*We note that this 
column vectors. 


PSYCHOMETRIKA—VOL. 18, No. 3 
SEPTEMBER, 1953 


A FURTHER NOTE ON «FINITE MARKOV PROCESSES IN 
PSYCHOLOGY”* 


Leo A. GOODMAN 
THE UNIVERSITY OF CHICAGO 


«Finite Markov Processes in Psychology,” 


С. A. Miller derived a “least-squares estimate” for a matrix of transitional 
probabilities (1). However, the mathematical proof seems to be unclear. 
Since this proof is considered invalid (2), we shall present a somewhat clearer 
version of the proof of this result. We shall also examine the general problem 


in some detail. 
In the proof we shal 


In his interesting article 


1 assume that the reader is familiar with matrix 
notation, which enables а considerably shorter presentation. We shall follow 
the matrix conventions and the terminology adopted by Miller (1). 

Let ma @ = 610? а) represent the observed distribution on the 
kth trial (k = 12, 777» п). That is, mis 18 the proportion observed in the 
ith alternative quantity оп the kth trial. There are a alternative quantities 
and n such trials. Let » 


та Тїз Miı,n-1 
Mo, Maz Ma,n-1 
М = | ти Ma Та ,п-1 
Ma Maz Ma,n-1 


be the a X (n — 1) matrix acing in successive columns the 
distributions observed on succ om trial 1 through trial п — 16 
Following the notation adopted by Ме, jd be the. transitional 
probability that an observation W i in the jth alternative quantity 
(ў = 1,2, =+- , a) ata given trial, will be in the ith alternative quantity 

*This report was prepared in connection with research supported by the Office of 
Naval Research. 


245 


246 PSYCHOMETRIKA 


(i = 1, 2, --- , а) on the following trial. We define the row vectors Т; = 
[i „ба, +++ , ы] and thea X a transformation matrix 


i d. c tis] 


tor too ers tog 
T= li {зә tza 
Ltar taz jos 


We also define the row vectors №, = (таг, ma , ma +++ , тај and О, = 
Т.М — N; . The problem is then to determine a matrix 


і tg MEE ds. 


ta [^ эё: D» 
T = ti p os ds . 
lin te 


such that СС is minimized for all values of i G = 1, 2, +++ , a) when T is 


taken equal to 7. By the usual proof in the theory o/ least squares (ef. [3], 55); 
we see that С,С is minimized when T; is taken equal to 


Т, = N,M'(MM")* (if MM’ is nonsingular). 


Hence, the “least-squares estimate” is 


^l 
Ш 


which is, in fact, 


T= NM'MM'Y', 
where N is defined as 


LEO A. GOODMAN 247 


№. Q.E.D. 


We might wish our estimates Т of the parameter T to have some of the 
Pame properties as the. parameter T. For example, it may sometimes be 
desirable to require Ш7 = [1], since we have that [1]T = [1], where [1] is 
ector [L; D 9*5 1]. We shall now prove that the 


the a-dimensional row v 
“least-squares estimate" T has this desirable property. 


"Типовим: We have that ШТ = Ш, where Т = мм мм). 
Pnoor: From the definition of N, we see that 


ШУ = Ш. 
nt to prove that 
uar ar = [1], 


Hence, it is sufficie 


or 
um = ШММ". 
From the definition of M, we have that ШМ = [1]. Hence we have that 
мм" = nm. Q.E.D. 

s using the general regression 
methods presented by S roblem is that of estimating 
a X a parameters which are subject to a linear restraints. We shall be inter- 
« СС order to obtain the “least-squares estimate.” 

ter which is à point in an ala — 1) 


ested in minimizing 2.i-1 
In other words, we wish to estimate a parame 5 
| у Ne in a subset of this 


dimensional space. Since &; Z °, 2 
space. If we also wish our estimate T to lie jn this same subset, 
estimation is still quite straightforward 
obtain the “least-squares estimate" T. If this estimate li 
(i.e., ts; > 0), then T is used to estimate T. If T is not inclu 
then the appropriate estimate will lie. on the boundary of the subset. We 
then use that estimate on the boundary of the subset which minimizes 


Dues быб!» 

The numerical examples in (1) illustrate how this result is used ша 
learning experiment in а T-maze. As the author himself has pointed out, the 
least-squares fit described in (1) is not most efficient for Markov processes. 


obtained these result: 


We might also have 
. 8. Wilks (4). The p 


248 PSYCHOMETRIKA 


ТЕ the observed transitional proportions are available, they would clearly be 
more appropriate in the estimation of transitional probabilities. у 


SS 


REFERENCES 


1. Miller, G. A. Finite Markov processes in psychology. Psychometrika, 1952, 17, 149-167. 

2. Kao, Richard. Note on Miller's “Finite Markov processes in Psychology." Psychometrika, 
1953, 18, 241-243. 

3. Kempthorne, Oscar. The design and analysis of experiments. New York, John Wiley and 
Sons, Inc., 1952. 

4. Wilks, S. 5. Mathematical statistics. Princeton: Princeton Univ. Press, 1944. 


Manuscript received 2/15/58 
Revised manuscript received 3/14/53 


ВВ РРА, 


PSYCHOMETRIKA—VOL. 18, No. 3 
SEPTEMBER, 1953 


А PROPOSED INDEX OF THE CONFORMITY OF ONE 
SOCIOMETRIC MEASUREMENT TO ANOTHER* 


Ву Leo Karz AND James Н. POWELL 


MICHIGAN STATE COLLEGE 
AND 
UNIVERSITY OF CALIFORNIA AT BERKELEY 


An index is proposed to measure the extent of agreement of the data of а 
sociometrie test with another test made at an earlier time or on another test 
criterion. The index is used to define an index of concordance between the two 
tests. It is shown how the index may be used for either individuals or groups. 
Tests of the hypothesis that agreement is random are given for all cases and 


applied to an example. 


1. Introduction. Whenever two or more (sociometric) measurements are 
made of the interpersonal relationships among one group of individuals, 
questions immediately arise concerning the extent to which one set of measure- 
ments conforms to another. Examples of this kind abound in the literature; 
We will note only a few. 

A second measuremen? on the same group invariably raises the intriguing 
question of how much the pattern of relationships observed in the earlier 
measurement has persisted in the later. Further (and in a more fundamental 
Sense), what is the nature of this persistence? An example of this kind of 
data was presented to the authors by Dr. Hilda Taba of San Francisco 
State College. In the course of an extensive study of a class of 25 children, 
she asked each of them, at intervals of about two months, to name the three 
others they preferred to be seated with, in smaller groups. The resulting 
Series of measurements provide information on the persistence of choice 


patterns and, in particular, data to test the hypothesis that the persistence 


Phenomenon is stationary, i.e., dependent only on the time interval between 


& pair of measurements. 9 
In the study discussed above, the subjects were also asked at one point 


to name those others with whom they would like to work on a specific activity, 
namely, mathematics assignments. Noting some conformity of the special 
choice patterns to the general, we ask whether this is a random effect, and, 
if not, what is the order of the excess over chance conformity? Similar ques- 
tions might be raised for other special situations; in this case, the relative 


*Work done under the sponsorship of the Office of Naval Research. 
249 


250 PSYCHOMETRIKA 


excesses could be used to determine (inversely) how “special” each situation 
is, in comparison with the others. 

As a final example, and the one to which we shall return for illustration 
because of the smaller numbers involved, we consider the technique in which 
each individual is asked to give, first, his choices among the others, and, 
second, his guesses as to which of the others will choose him. Here, we are 
usually concerned with whether, and to what extent, the perceived choice 
configuration conforms to the actual. Also, in this case to a greater measure 
than in the preceding, we are interested in the variable accuracy of perception 
among the individuals in the group. 

In each of the last two examples, we encounter the usual confusion of 
"independent" and "dependent" variables, with neither bearing a causal 
relationship to the other. In the first example, with measurements in a natural 
time sequence, there is no possibility of confusion of priority. We shall return 
to this question in section 5. 


2. The index. For a group of n individuals there are n(n — 1) ordered pairs. 
Our basic information specifies, for each ordered pair, whether (a) neither 
relation exists, (b) relation X exists but not Y, (c) Y exists but not Х, or 
(d) both X and Y exist. Following custom, we take relation X to be the 
prior or "independent" relation; Y the posterior or "dependent" relation. 
Тће generie question we ask is: To what extent does the occurrence of relation 
Y in the ordered pairs conform to the occurrence of relation X? In any 


specific instance, our data may be summarized as in the fourfold distribution 
of Table 1. 


TABLE 1 
Joint Distribution of Occurrence of X and Y for a Group of n Persons 


Y Y Total 
x Ney Neg n. 
X Ney nig ns 
Total Ny ng n(n — 1) 


In Table 1, n., (for example) represents the number of ordered pairs 
in which relation X occurs and relation Y does not occur from the first 
individual to the second of the pair; и, , equal to the sum of Ney and па, , 18 
the number of pairs having relation Y without regard to Базага ог noti- 
occurrence of X. In this form, it is well known that questions of dependence 
or concomitant variation depend on the four numbers in the body of the 
table and that the marginal totals provide only a frame of freedom for the 


ме 


b em 


= uc: \ MEL 


LEO KATZ AND JAMES H. POWELL 251 


body of the table when the marginal totals are considered to be known a 
priori. Hence we may take any one of the numbers in the body of the table, 
say n- , as the essential variable. Lastly, п. is known to possess the hyper- 
geometric probability distribution under the hypothesis that X and Y are 
independent, or, in our terms, that occurrence of Y does not particularly 
conform to occurrence of X. 

We now proceed to construction of an index of conformity (T) having 
the following three properties: 


(1) Г = 0 when X and Y are independent, 

(2) T = 1 when occurrence of Y conforms exactly to occurrence of X 
(note that we say nothing about non-occurrences of Y), 

(3) Г is estimated by a linear function of n,, . 


We first note that, when X and Y are independent, E(n.,) = 
n,n,/n(n — 1). Since we know that n,, cannot be less than zero nor greater 
than п, , we define Г by the following expression for the conditional expected 


(mean) value of ry : 


lz 


Ena | T) = "EST (n, + Г). (1) 


We observe that E(n,,| Г = 0) = nn,/n(n — 1), whieh is precisely the 
condition for statistical independence of X and Y. Secondly E(n,, | Г = 
1) = n, , ie., every ordered pair which has relation X also has relation Y 
and the conformity is complete. Г is a linear function of E(n.,); therefore, 
we take for our estimate cf Г the solution of (1) with Z(n.,) replaced by the 
Observed n,, . This gives 


^ 1 
Г = 
п.т; 


ЕУ) 


[n(n — Dn, — ть]. (2) 


Equation (1) expresses that the expected value of Ney depends upon 
the underlying parameter Г, which may take any value in the interval 
[— (n,/n;), 1]. In most applications, п, is smaller than я, . Since the estimate 
of equation (2) has the appropriate conventional expected values of zero 
when Y does not conform to X and unity when conformity is perfect, we 
may (and shall) take the estimate to be our index of conformity. | 

One advantage of this choice of index is that we have immediately an 
Unbiased estimate of the conceptual underlying parameter. Another ad- 
vantage, we shall see, is that Г lends itself to simple standard tests of the 
hypothesis that Г = 0. Still another advantage, not insignificant from the 
Standpoint of the practitioner, 15 the ease of computation of this index. 


3- T'est of the hypothesis of absence of conformity. In most cases in practice, 
We will believe on intuitive grounds that there actually is some degree of 
conformity present and shall desire only to estimate that degree. Nevertheless, 


252 PSYCHOMETRIKA 


it is logically necessary that we establish that our belief is well founded. 
Accordingly, we give in this section a test of the hypothesis that Г = 0. 
It was shown by Katz (2) in 1941 that a “best” test for Г = 0 against alter- 
natives Г > 0 (in the likelihood-ratio sense) is given by the upper tail of the 
hypergeometric distribution and that a “best unbiased" test against Г = 0 
is given by two tails of the same distribution so chosen that the mean value 
of the tails is equal to the mean of the entire distribution. (Note that, aside 
from choice of critical regions, this is the well-known Fisher “exact” test 
for the four-fold table.) 

Thus, whenever n(n — 1) is small, we may test the hypothesis exactly, 
using the recent tables given by Finney (1). W hen, as is more likely, n(n — 1) 
is large and each of n., , nz , ns, and Паз is large enough (say, >2), the x? 
approximation is adequate. A simple computation gives 


2 _ nn — Inn; <> 
=f 
x nn , 


(3) 


v 


with one d.f. Even simpler is 


aa In(n — Пип, Ё, (за). 
nan, 


which is approximately normally distributed about zero with unit variance. 
In case n, = n, , (За) reduces approximately to z — nf and the hypothesis 
is rejected at the 597 level whenever | P | > 2/n, approximately. 

As always when the x’, or the equivalent z, approximation to the exact 
test is made, one should make the Yates correction, consisting of either 
adding or subtracting one-half unit from Nzy S0 аз to decrease the absolute 
value of Г. 


4. The index for an individual. One is often in the position of wishing to make 
tests and estimates, similar to those described above, for the individual 
members of the group. In these instances, one asks, “То what extent does 
an individual's pattern of (outgoing) choices persist to a later time or to 
another criterion of choice?" or “То what extent is the set of an individual's 


TABLE 2 
Joint Distribution of Осештепсе of X and Y for a Single Individual 


Y ¥ Total 
x Ney (i) Nzg(t) n4) 
x Nz, (7) т.200) nsi) 


Total n,(z) АӨ) п — 1 


LEO KATZ AND JAMES H. POWELL 258 


incoming choices unchanged?” In either case, our data are the observations 
on (n — 1) ordered pairs and may be exhibited as in Table 2. 

In this table, entries are interpreted as in Table 1 except that it is neces- 
sary to note that the entries in the body of the table and the marginal totals 
are those for the ith individual. Everything goes through exactly as in $2 
and we obtain the index of conformity for the ith individual, 


i = co [n = Dna) — an. (4) 


The tests of $3 hold as before with Г replaced by y(i), n(n — 1) by 
(n — 1), and the marginal totals by the corresponding totals for the ith 
individual. Here we shall usually require the exact test and, for very small 
groups, the test may break down in the sense that we are unable to reject 
the hypothesis of lack of conformity whatever be the value of п. (2). We 
shall observe this in the example of $6, in which we deal with a group of 
ten persons. 
5. The ambiguous сазе. In many situations, it is not possible to identify 
one relation as antecedent or "independent" with respect to the other. In 
this сазе, we have exactly the same problem which arises in any regression 
analysis when we are uncertain as to which regression coefficient is meaningful. 
Accordiugly, we shall use exactly the same device for resolving the difficulty. 
We define a coefficient of concordance as the geometric mean of the two in- 
dices of conformity of X with Y and of Y with X. Thus, we obtain 


~ ~ _ [nv — Dn, — nal (5) 
B = Ys ee 5 
С° = TPs NNN MNs г 


ce. We should attach to C the algebraic 


there C i ici f concordan : dee 
Where C is the coefficient о this is the sign of both indices 


sign of the factor in square brackets since 


of "mi i 
g 7 that C! = x'/n(n — 1), the mean square contingency of 


Table 1. [See, for example, Kendall (3), 318-319.] PER I that the 
test of significance for C is exactly the same as for either index of conformity. 


6. An example. We shall consider data collected and kindly made available 
to us by Dr. Renato Tagiuri, of the Harvard University Laboratory of 
Social Вејићини Dr. Tagiuri asked each of a group of ten graduate students 
г Е Ва. you like most?” and second, “Which 
goyor 


first, “Whi bers of the group do cda 
eie el like you most?" In this situation, armchair 


members of the group do you feel yo 
philosophizing can construct à case for either сл. vd that bp селе 
аге fairly sophisticated and, hence, perceive, re а ien vj беж eiy > 
argüment (b) that these people are fairly миле пе in › зар] pains 
their feelings so that relationships cannot нь Lea 


254 PSYCHOMETRIKA 


there is a need for objective measurement of agreement. Equally obvious 
is the ambiguity of the situation, for it is difficult to decide whether we are 
primarily interested in how well perceived relationships conform to the 
actual ones or in the obverse conformity. Since, fortunately, resolution of 
this dilemma is not the purpose of this paper, we shall assume that the second 
question is our primary concern although we will make both computations. 
The data appear in Tables 3 and 3a. 


TABLE 3 
Positive Choices Expressed by Ten Individuals 


—— — —Hi— 


LEO KATZ AND JAMES H. POWELL 255 


In both tables, the individuals’ responses appear in rows. Thus, the 
first individual chooses (in Table 3) the second, third, and fifth and feels he 
will be chosen by (in Table 3a) the second, fifth, and tenth. For the ith in- 
dividual, we are concerned with how well the choices he actually receives, 
the ith column of Table 3, agree with the choices he thinks he will receive, 
the ith row of Table 3a. We first consider individual conformity; the results 
of these computations are summarized in Table 4. 


TABLE 4 . 
Conformity of Actual to Perceived Choices of Individuals 


Individual (i) n) Pr (n,G) or more} y(i) 

1 8 .12 1.00 

2 3 .36 .56 

3 1 .42 .48 

4 2 17 1.00 

5 3 .12 1.00 

6 0 1.00 .00 

7 1 .58 .24 

8 1 .22 1.00 

9 3 .048* 1.00 

10 2 .083 .57 
*Significant at 5% level. 


In the case of a small group, such as this, it is difficult to obtain sharp 
tests of significance for individuals. Thus, while there is perfect agreement, 
With the choices of five individuals, in only one instance (9) can we reject 
at the 5% level the hypothesis that agreement is by chance alone. We recom- 
mend, therefore, whenever it is necessary to examine individual conformity 
in small groups, that each individual be asked to name approximately half 
the group as those most likely to choose him in order to make the tests as 
Sharp as possible. 

The story is quite different when we measure group agreement; here 
Our observations are adequate to construct reasonable tests. Our data and 
Computations are summarized in Table 5. = E 

For a group of ten, the approximate test of $3 indicates significant (595 
level) departure from random agreement when | C | 7 .20; the degree of 
association is measured by | C |, where C has the force of a correlation co- 
efficient. In fact, if X and Y are interpreted as variables taking values 0 
and 1 only, C is the correlation coefficient. | | 

It may be worth while (from the psychologist’s point of view) to note 
that the individuals who took part in this experiment were asked the same 
two questions with respect to rejections, or negative choices. For these 


256 


data, the concordance index, C, was .14, not significantly different from 
zero. We might conclude, therefore, that this group is able, to a limited 
extent, to perceive positive feelings but seems practically unable to discern 


PSYCHOMETRIKA 


TABLE 5 
Conformity of Actual to Perceived Choices of Entire Group 


Actual Choices (Y) 


Perceived ла 
Choices (X) Y Y Total 
X 19 8 27 
x 12 51 63 
Тош 31 59 90 
Г: (conformity of actual to perceived) = .55, 


Г. (conformity of perceived to actual) = .45, 
C (concordance between actual and perceived) = .50. 


existing negative feelings. 


1. Finney, D. J. The Fisher-Yates test of significance in 2 X 2 contingency tables. Bio- 
y tables. 


REFERENCES 


metrika, 1948, 35, 145-156. 


2. Katz, Leo. The test of the hypothesis of no asssociation in the four-fold table in the light 
of the Neyman-Pearson theory. Statistical Research Laboratory, Michigan State 


College, RM-9, 1952. 


3. Kendall, M. G. The advanced theory of statistics, Vol. I. London: C. Griffin & Co., 1947. 
o 


Manuscript received 12/7/52 


Revised manuscript received 3/26/53 


Р; 


PSYCHOMETRIKA—VOL. 18, No. 3 
SEPTEMBER, 1953 


DIFFERENCES IN FACTOR CONTENT 
OF RIGHTS AND WRONGS SCORES* 


BENJAMIN FRUCHTER 
THE UNIVERSITY OF TEXAS 


The right-response scores and wrong-response scores of speeded aptitude 
tests were factor analyzed to determine whether they differ in factorial 
content. The information thus obtained was used to derive scoring formulas 
that yield purer measures of a factor than do scoring formulas derived in 


other ways. 


I. Introduction 


Different formulas are used to score tests or other measuring devices 
for different purposes. One formula may be used to correct for guessing, 
another for maximizing the reliability, and still a third for maximizing the 
validity of a test for a specific criterion. As has been pointed out by Thurstone 
(6), application of these scoring formulas will affect the correlations of timed 
or speeded tests only. For tests in which every item is attempted and scored 
either right or wrong, the correlation between the number of right and the 
number of wrong responses is — 1.00. Consequently scores based on a formula 
that corrects the number of right responses on an untimed test by some 
function of the number of wrong responses will have the same correlations 
with other tests as scores based on the number of correct responses only. 

For tests administered under time-limit conditions, however, the wrongs 
pendent of the rights scores. The data in Table 
dministered to aviation students during World 
ate that the relatively independent wrongs 
have reliabilities and validities 


Scores may be relatively inde 
1, based on results of tests а 
War II, are exhibited to illustr 
scores derived from time-limit tests can 
comparable to those of the rights scores. 

It was hypothesized that the low 
wrong responses of some speeded tests mig 


were being measured by these two types of score: 
ation submitted in partial Гаал, of the 
sec - of Philosophy degree at the University of Southern California, 
eggs ogee tage a a peered Psychological Association in September, 1949. 
Тће writer is greatly indebted to Dr. J. P. Guilford for pr oviding The ега E 
distribution data from the files of the School of Aviation ME EDS / for genem guidance 
throughout the study. He also wishes to express his appro e, а on anrora On 
and Set. James R- MacDonald for computations, eSF riter was a civilian employee of 
д 5 атан jj writer Wi ane yee 
of the centroid factors, which were per formed while x^ Canter, “The opinions expressed 


the Air Training Command, Human Resources Researe! ne 
are those of the writer and do not necess flect the official views of the USAF. 


correlation between the right and 
ht indicate that different functions 
s. For example, the probability 


*This paper is a revision of a dissert 


arily re 


257 


258 PSYCHOMETRIKA 


TABLE 1 
Correlations Between Rights and Wrongs Scores, Reliabilities, and Validities of 
Time-Limit Tests Administered to Aviation Students* 


Alternate- Validity Training 
Test rrw Forms Reliability Coefficients Criterion 


Rights Wrongs Rights Wrongs 


Map Distance = 07 .98  —.18 navigator 
Position Orientation —.10 18 —.19 pilot 
Object Recognition —.22 .84 .76 .26 —.07 pilot 
Estimation of Length 4.251 .65 .72 13  —.01 pilot 
Visualization of Maneuvers, С — .35 .94 —.20 navigator 
Spatial Visualization, п —.66 .95 =.27 navigator 


*Data obtained from Guilford (3). 
iThis unexpected positive correlation may be due, in part, to faulty reproduction of some of the test 


items. 


of arriving at the wrong answer to an item might be related to differences in 
numerical ability, whereas the probability of arriving at the right answer 
to the same item might be related to differences in reasoning ability and be 
unrelated to differences in numerical ability. 


II. Background 


Since rights and wrongs scores are relatively independent in some speeded 
tests, and both types of scores seem to contain reliable, valid variance, it 
would be of some interest to determine the nature of the differences between 
these two types of scores. Several studies have indicated the differing content 
of the rights and wrongs scores of a test. 

When the items of the Map Distance Test (3, 458-461), which had been 


administered under time-limit conditions to a sample of aviation cadets, 
were analyzed on the basis of the highest and lowest 27 per cent of the total 
scores obtained from the scoring formula S = R — 3W + 40, the mean phi 
(.32) based on the total number who had answered each item was higher 
than the mean phi (.10) based on the total group taking the test. These 
results are the reverse of what is usually expected from a highly speeded 
test, for which the phi’s based on those who answered an item usually would 
be near zero, whereas the phi’s based on the total group taking the test should 
regularly increase, from the last item which everyone had managed to reach, 
to the end of the test. As the speed element in a test decreases in importance, 
the discrepancy between phi's computed on the two bases decreases until, 
with a pure power test, it disappears. 

An item analysis, performed against the criterion of total rights only on 
the Map Distance test, yielded a mean phi based on the total answering each 


ox 


p 


BENJAMIN FRUCHTER 259 


item of .18, and a mean phi based on the total group taking the test of .31. 
Apparently the rights score is a speed score, whereas the formula score, 
heavily weighted for wrongs, is a power score. 

Further evidence that wrongs scores may measure functions not found 
in rights scores is obtained from the analysis of tests designed to measure the 
trait of carefulness (2, 3). When four tests designed to measure carefulness 
were administered to a sample of aviation students, it was observed that 
large numbers of wrong responses were made and that the distributions of 
error scores had considerable range and variability. The correlations between 
the rights and wrongs scores of these tests were treated as separate variables 
and correlated with the formula scores of a number of other tests. Factor 
analysis of these correlations revealed a new factor, uniquely characteristic 
f the carefulness tests. Had the error scores not been 


of the wrongs scores О 
resembling carefulness might have been 


analyzed separately, no factor 
found in these tests. 


III. Selection and Correlation of the Variables 


r to determine the differential content of the rights and wrongs 
and to estimate the factor content of various scoring 
formulas combining these two types of scores, à battery of forty-five ex- 
perimental tests was administered to a sample of unclassified aviation students 


under time-limit conditions. The sample consisted of 8,158 male, unclassified 


aviation students, mostly 18 and 19 years of age, and of average or above- 
wrongs, and formula scores were 


average intelligence. Separate rights, 1 У 
obtained for each test, and the product-moment intercorrelations for each 


type of score were calculated. Because of the length of the battery no examinee 
took all of the tests: and the N’s for the correlations vary from 385 to 1,558, 
ely 450. Each examinee also had taken a 


with a median N of approximat | i 
battery of twenty-one classification tests, and the correlations of the ormula 
s with each of the three types of scores (rights, 


Scor these test : i 
ed rro he experimental tests were available. Tt was desired 


wrongs, and formula) of t dm 
to select for analysis those tests whose wrongs scores were relative y in- 
dependent of their rights scores and also reliable. Although the correlations 

es and reliabilities of those scores were 


between the rights and wrongs scor > 
not available for this sample, they were available for some of the tests on 
comparable samples, and were used as a guide in selecting variables for the 


analysis. Only those wrong scores whose distributions had sufficient vari- 
ability to justify confidence in the stability of their correlations with other 
scores were selected for analysis. Twenty-four experimental tests were judged 
to have sufficient, variability, independence, and reliability in their Mn 
seores to merit analysis. To these were added the a prior: formula scores o 


seven classification tests of known factor content to help guide the rotations 


and interpretations. The score distributions were normalized. A matrix con- 


In orde 
scores of time-limit tests 


260 PSYCHOMETRIKA 


taining the intercorrelations of the wrongs scores of the twenty-four experi- 
mental tests (with axes inverted to yield positive correlations with the 
classification tests) and the formula scores of the seven reference tests was 
obtained. Another matrix, consisting of the intercorrelations of the normalized 
rights scores of the same twenty-four experimental tests and formula scores 
of the same seven reference tests, was assembled to be analyzed for com- 
parison. Since the experimental tests were administered in overlapping sub- 
batteries, the N’s for the correlations vary*. It was considered justifiable to 
include the correlation coefficients from the various overlapping samples in 
one matrix for factor analysis since the aviation-student population is quite 
homogeneous and individuals were randomly assigned to the various sub- 
batteries. 


IV. Analysis of the Data 


The correlation matrix for the twenty-four rights scores (variables 1 to 
24) and seven formula scores (variables 25 to 31) was analyzed by the centroid 
method to nine factors, and the factors were rotated to meaningful positions. 
The resulting centroid and rotated factor loadings are shown in Table 2. 
Similar data for the nine factors that were extracted from the wrongs scores 
are shown in Table 3. 

The first eight rotated factors were interpreted alike in both analyses 
and are as follows: 


I. Spatial orientation 
II. Visualization e 
III. Associative memory” 
IV. Numerical facility 
V. Verbal comprehension 
VI. Reasoning 
VII. Visual memory 
VIII. Length estimation 


The ninth factor in the rights analysis was considered to be a residual, 
the loadings ranging from —.20 to .18. The ninth factor in the wrongs analysis 
is a triplet, and although its nature is not obvious, it would be of interest to 
identify it, since it may represent a function unique to error scores. The tests 
with the highest loadings on this factor previously have appeared on factors 
labeled “sequential reasoning" and "integration III” (3, 824 and 833). If the 
former interpretation should prove to be the correct one, it is probably related 
Өе нра E рысь оош акі ыдын 
Documentation. Institute. Order Document No. 3954 from American Documentation 
Institute, Auxiliary Publications Project, Photoduplication Service, Library of Congress, 


Washington 25, D. C., remitting $1.25 for mierofilm (images 1 inch high н 
motion picture film) or $1.25 for photoprints readable without optical до os 


261 


"(8) ut punoj oq uo ‘og pur yz војаешта jo uon 
7dooxo aq) (tA ‘5969 oq) jo suondrosop ә *(g) ur punoj oq ито туюр Ajrprpza pue Ауреол pav 'syrnpourr, 'euojt oduros Витрпјош 8389} 9j JO впотуймәовәст} 
"ynonimodp posn asam 594098 вашој TE INOI ez So[qut1vA oou010j01 10,14 


I€ 20 90 80— #0— 90— £1— 90 05 SF 20 м ст co— 


II 60 16 6g uonvurpiooo хојфа ој '12 
19 90— 80— 81— 80 81 11— 80 2-16 80 90— 600 55 OF og 60 85 61 siutuosvo оцәшцунү “OE 
6 80— 21— 6I— GI 60 61— 9ez— ЕР ср 00- 80 60 61 5 99 90 60— м  IIsuonwodo [vououmN "66 
Б TI- £0— $0— 20– 0—11 $1— 9% № 9I- ФЕ +E О 00 90 21 9 Or uonvogriuepr јо poadg "85 
6r 20 50 80— 20— 20 11— SP 20 8760 50 ol c0 99 28 воша |woruvyooyy *76 
I$ Ol 05-80 81— #1— ZI 15 8с 00– Ig 


Ol SI FB 96 uoruoqoiduioo juoumajsuT '92 


8S 40 20 Р— 8I 9€0— IT 9I /8— ge 90 20— т бт 95 Ze човпәцәлЧшоә 3urpvosq *ez 


54 $0 6—10 бб 18-08 80 £- 9 10 50 70— 20— 80 81 9sipo[wouy рлом „гум “FZ 
19 20– 0-м gi И TI— 9I— K- 09 с0— c0— 9$ 00— $0 55 Sutuostar [ева '22 
96 & +0 $0— 90— 70 ZI 02 OT +6 Е 90 60— Ө S£ 15 поту 59 auy "zc 
сс © £0 GO 21— 01 21– 20– Lī- cv 80 90 75 10 SI FI Suruuvid 9Annoduro,) "TZ 
= 6 П— 96 SI 90- $0— 0z— tI- 9z— 09 61— 90 $5  c0— $0 85 , 981n00 v Яши “OZ 
B ck 55 +I 9I— 91— #6 95  0c— 70 88 81 55 40 9% OL +0- Алошош oureu ouv[q '61 
я 9% 81— 21— II- £I- Ze— £l 80— vc бт 05— IS 00— 90— 8I FI A[quioss шәүјва '81 
о 49 10— 66— ZI Gl Gt OL 80 ic 19 70- 80 от OI IE $5 Syeurpuv[ [ytioy “LT 
p 6 26 01— 60 60 £I- cl 81— ez- 79 OL FT 60 #0  90— cc чогувшао} FYB "9r 
= 09 80 € Ssc- ze gI 02 60-22 ТР 80— OI 00 S9 90 9% &iourour пород ouviq “GT 
я 79 Е 81 ёг SI +0- SI— 95 cc #9 00— 10— 10 OT 86 29 uonwuoHo [BUOY “PT 
Я I} 90 21— #1— $T 80 26— #4— 81 v» 90 тё $0— 88 OL  60— Ie uorwuauo uontsoq “ET 
a OF И 60- 90 01-01 $0— #1 21— 9e 81 90 16 II— 58 v6 10190994 329190) “ZT 
B v9 0I LI- FI Z- Cl $0 81 Ш 99 81 61 f0— 20— 9% $5 I uonvezienstA [eryedg ‘ТТ 
| 48 90— t0 Z0- 21— #1— 97 20 05-18 90— 05 70 $0 6 I uonvogissv[o ens "QT 
а 66 FI 90— 61 90 It а SI— 90 se 80 90 80 ef  t0— eT Хлошәш [enst A '6 
TS п— £0 GI 90 +0 л—е GI c9 Zi- 00— 


st 0— 88 9 UorwzivnstA uorjeurio '8 


$9 FI- 40 If 21 g0- 61— 0g 91 29 81— 10 I1 00 GF $9 siAnouvur]o UODVZI[UDSTA "л 


ry 10  60— 01— 6I— 91— 21— Z- $0 OF Т0 6r 


96 10— I- ZI 48491 jo uonvumnsy *9 
€ 01— £I— 80— #1— т 80— Se- $1 9р 11— 9р SG  10— f$0— fI 9ouvjsrp de сс 
45 10 01— 26 20- 10— 91— 21— 80 6g с0 її IT 91— 00 gz SioAmouvur are Jutuuvjq р 
69 FI- 90 €0- FZ- 01 GO 5 OI- EL 90— 81 сс 60 (6€ 05 II uonvzmunsta [vrjedg '2 
6¢ FI- OL 60 TI- f0 Of 90 SI— Tz £I— а 6I II 88 6 Solsoyeuy әлпїйїї g 
6 sO 02 $0! ZI е SI 02-60 9с 80— со 0 Se 20 55 Алошош dv] T 

XI HIA ПА IA A AI II IH I XI ША AI HI п I peL 
i Ssurpvorp ролупод o  SBurptory pojujyoqt 


(poj1ruo sjurod [vurnoo(q) 
+591025 ча — Я итреот рода) pue pojvjoj[ 
$ IH VLL 


PSYCHOMETRIKA 


262 


"упоцаполуу posn әләл\ 591008 орашлој TE 4310493 eG војавшва 090919]01 10:14 


се 40— 00-60 TI- 9I 15 @- 81 88 £0— 20 10 er 70 SI 0 90 ФР uomvurpzooo xo[durop) “TE 
09 OI 90 II— 01- Л- Sz П— 96— 29 FO OL I ET & 10 c9 1 950 Suruosvoa1 опошу му “OF 
PR 90 OI— SI ZI— FI- 2I 9— 9£— 9e 00 TI at 80 үт 20— Z9 10 #0  IISuonurodo [potroumN 66 
ес по 20— 01 80-60 #0 п 6c £0— П а 10 IL 6 $0— 28 uonvognuepr Jo рээа$ '82 
Ор со 20 90 21 60 ££ £0 SI $9 T0— 10— £0— 0 мо 16 St OF 88 so[droutid [voruvioo]A "26 
ос £0 £0— 61— 60 Of Ze S£— 91 ср 00 90— £0 in Tz gt 0 20 4g човиоцеланоо пошти) 9g 
ве Пп м 9-м &£I— OI 55 66— 88 80 90 1с 8I 6 80 6 0 II uoisuoqo1duroo Зштрвәў[ ‘45 
9p 90 Л 20 9 21— St- 60— 26— 08 1 м 26 0 0с 00 20 61 00 әЯрәрмоц раом дүү "TG 
96 21— #1— 94— GT бї 61— Gc 16— $9 59 20- v6 20 10 т м 28 10 Suruosuo1 [919945 "gc 
IP ZI п OI © 6z 9 20—81 6» 9 OI 20 ¥% тт 88 20— 91 9 uorjvurmso o[duy "GG 
1% ®0— м— 20 GI- 80 20 gi~ 9I— 0 1 AT пе 0—80 18 0 II Suruuv[d oArjmoduror) "TG 
56 м— ol- 20— £0— IT oT- gI AI- I$ zę 90 @ Zl- 40— 60 91 $c 00 osinoo v zuruuv[q '06 
zr 91— 81 11— 60— 80— 18- 4c— OI OF FO FO $89 OF ет 20 10 #0- 16 Хлошәш ошти ouv[q '61 
1o St м м 91— 6I GI- 01 60— 25 м 08 8 5 80 то 01 16 10- Á[quiessu плојува “ST 
њом пло 80— 90— 01— SI 91 249 FO 82 15 %0 #0 т 90 Т? SI syrvurpuu[ [Hoy “LT 
+ ст 80— 40— S0 90— SI- GO 65—89 95 9% 16 80 25 or soc . IB %0= uorjuuo] ITT ‘ЭТ 
& $0 20 £0— 01— 91— t£— Fo- 66 ОР GO- og zg 00 FT c0 20— 70 45 Хлошәш 1011504 ouv[q "GT 
og 9-ю 10— SI II I €90— 91 99 а 20-971 M GG Sr vl 28 9 чотувупаџо [ецоү “PT 
ва #1 10— SI— OL 90 Of Л п ge + то 10 20– 90 80 28 55 могу иомо пого] "er 
eg 91 60-60 20 80-01 20606 т OI GI Z OF 90— 21 28 15 uorru20991 399190) "GT 
19 10— 80 91 80 60 9 м 20—94 SI 80 м OF SI је Др gd Жї ү uonvzipunstA [919949 "ТТ 
59 б-а м м IT 90- то– 6z— 98 те 00 та 97 OF 20 ZI 05 70 WORBOYISSLIO отай “OT 
ТЕ gr ©@-01 60 90 15 02-80 OF 26 ср St 60 М 90 50 90 85 Хлошәш [ENSTA "6 
$9 Of 90-70 II SI— и 02 8I IZ 00 g її W C ет £9 85  uonwzunsiA потувшлоу "8 
z 60 е0 61— 9I $2— T0 2 #6 c9 OI— OI % OI- 24 81 ZO 09 SI SioAnouvurJo UOVZHVNSIA "7 
$e FI м O0I— SI- OF 9—06 SI © 00 Єт sc Ol- 80— 28 91— #5 80— чуйпәү Jo чоп `9 
ye 80 ст 05 81 90 10 16 GI 2€ 80— то– 25 60— 00— SE — TO 6& 20 sourysip deN `6 
62  T0— 20— £0 30 20— 20— Sz 90 OF м от SI 90 70 70 20 ер 90 =әлпошиш ше Запиияа р 
SF 1 Ol бї 90— 01— 20 GI 90 I9 10— Gr 22 ce 90 20 80 Sr 60 тт uonvzmensta [eryvdg '8 
29 O9I— zl 51 21— 91 %0 Of 0—69 то п де 8 et 10 08 er 10 soiio[uuv әлді б 
j£ or- 90— £0— 80— 21— 94— 6I— 60 Ж 20 5 9 90 OL 80- OI 60 $6 Aiouou dv] "T 
XI WIA IA м A AL HI WM Tt XI паши A AL ш п I вә, 
zu Sgurpuor[ рол} чер) зяитрзот ратом. 


(poruo syurod euro) 


„591008 SSuo1AA —ssurpuo'T proua риз рэ33о 1 


$ WISVIL 


BENJAMIN FRUCHTER 263 


to the reasoning-by-elimination which is sometimes used to select the answers 
to multiple-choice items. 


V. Discussion 


As might be expected from their low correlation, the factor content of 
the rights and wrongs scores of some tests is quite different. Thus, for the 
Planning Air Maneuvers test, the principal loading for the rights score is on 
the reasoning factor, and for the wrongs score on the visualization factor. 
For the Visualization of Maneuvers test, the rights have a higher loading 
than the wrongs on the spatial-orientation factor, while the wrongs are more 
heavily loaded on the visualization factor. 

Similarly, for several of the factors, some of the loadings in the wrongs 
analysis were higher than the corresponding loadings in the rights analysis. 
This seems to be especially true of the visualization factor, on which twelve 
out of fourteen experimental tests had higher loadings for their wrongs scores 
than for their rights scores. For tests such as Planning Air Maneuvers, Map 
Distance, and Position Orientation, reasoning, length-estimation, or numerical 
abilities may have been used in arriving at the correct answers. Apparently, 
wrong answers frequently were arrived at because of incorrect visualization. 
The higher loadings of the wrongs scores may indicate that they are good 
measures of the visualization factor. Other factors that have higher loadings 
for the .wrongs scores of some tests than for the corresponding rights scores 
are memory, number, and reasoning. Presumably for these tests the variance 
of the wrongs scores is more dependent on individual differences on these 
factors than is the variance of the corresponding rights scores. 

The question of whether the loadings of the formula scores can be pre- 
dicted from the weighting of the rights and wrongs in the formula might also 
be raised.*' 4 3 

Unfortunately, all of the required data were not available for this sample, 
but the necessary values (which in addition to the loadings are the standard 
deviations of the rights and wrongs scores and the correlations between the 
rights and wrongs) were available on comparable samples for some of the 
tests. The results of the attempt to estimate the loadings of the formula scores, 
on certain factors, from the loadings of the rights and wrongs scores on these 
factors, by means of the formula for the correlation of a weighted sum with 
а third variable, are shown in Table 4. For some factors the estimated loadings 
agree well with the obtained loadings, while for other factors the discrepancies 
are larger, possibly indicating that the cross-identification of the factors does 
not hold in these cases. In evaluating the results it should also be borne in 
mind that some of the values used in the calculations were estimated from 
other samples rather than obtained from this sample, and also that there was 


*These loadings had been determined for all but one of the experimental tests in 
Guilford, Fruchter, and Zimmerman (5). 


264 PSYCHOMETRIKA 


а larger number of factors in the formula-score battery analysis (due to the 
larger variety of tests included) than was brought out in the analyses of the 
intercorrelations of the rights and wrongs scores. 


ТАВЉЕ 4 
Comparison of the Obtained Factor Loadings of the Formula Scores 
with Loadings Predicted from the Factor Content of the Rights and Wrongs Scores 


Estimated Obtained 
Formula- Formula- 
Score Score 
Test Factor Loading Loading* 
1. Map Memory $ Visual memory .42 .50 .43 
Paired-associates 
memory .42 .16 .26 
7. Visualization of Spatial orientation «5 .59 .98 
Maneuvers, Form C Visualization „51 EE .26 
11. Spatial Visualization I Visualization .56 .63 .60 
14. Aerial Orientation Spatial orientation ЖЕ .61 .62 
19. Plane Name Memory Paired-associates 
memory ‚51 .16 .12 


*The loading on the left is from solution I, and the loading on the right is from solution II in J.P. 
Guilford, B. Fruchter, and W. 5. Zimmerman, (5). 


If the rights and wrongs scores of a test are relatively independent and 
have their loadings largely on different factors, there are empirical methods 
for determining useful scoring-formula weights. Guilford and Michael (4) 
have developed formulas to determine the weights which, if applied to separate 
scores, would maximize the variance on a desired factor, minimize the variance 
on an undesired factor, or establish a specified ratio between them. Applying 
their formula (1) to the Map Distance test indicates that the scoring formula 
which would maximize the length-estimation variance of that test is (S = 
Е + W). The loadings of this score as estimated by the formula is .67 on the 
length-estimation factor and .15 on the visualization factor, whereas the 
loading of the empirically-derived, optimally-valid scoring formula (S = 
Е — 3W + 40) is reported to be .38 on the visualization factor and .30 on 
the length-estimation factor (cf. 3, 458). This test had been constructed 
because of interest in its length-estimation variance. Actually, a good measure 
of length-estimation ability had been constructed. Choice of a scoring formula 
that did not maximize the length-estimation variance, however, made it 
appear to be another test of just moderate validity with its major variance 
on the visualization factor which, although valid for the criterion then under 
consideration, is better measured by other tests. 


Another example of the importance of proper weighting of speeded tests 


BENJAMIN FRUCHTER 265 


can be given in connection with the Estimation of Length test. Estimating 
line lengths seems to enter into many perceptual and some visualization tasks. 
"This test was constructed on the hypothesis that if the interpretation of the 
function represented by this faetor is correct, the test should be a relatively 
pure measure of that function. It is a five-alternative multiple-choice test, 
and the conventional a priori weighing given the wrongs would be —.25. 

Applying the formula for maximizing the desired variance to the data 
of the Estimation of Length test indicates that the scoring formula (S= 
R + .75W) would maximize the length-estiniation variance and give the 
most useful information from the scores on this test. Application of the 
correlation-of-weighted-sums formula indicates that this weighting would 
yield a loading of .70 on the length-estimation factor, whereas the a priori 
weighting (S = R —.25W) gives an estimated loading of .40 on the length- 
estimation factor. Thus, with the optimal weighting, more than three times as 
much of the desired variance would be obtainable. 


REFERENCES 
1. Fruchter, B. The factorial content of right-response scores and wrong-response scores 
of a battery of experimental aptitude tests. Unpublished Ph.D. dissertation, University 


of Southern California, 1948. 
2. Fruchter, B. Error scores as a measure of earefulness. J. educ. Psychol., 1950, 41, 


279-291. 
3. Guiiford, J. P. (Ed.) Printed classification tests. Washington: U.S. Government Printing 


Office, 1947. . 
4. Guilford, J. P. and Michael, W. B. Approaches to univocal factor scores. Psychometrika, 


1948, 13, 1-22. 9 , 
5. Guilford, J. P., Fruchter, B., and Zimmerman, W. S. Factor analysis of the Army Air 
Forces Sheppard Field battery of experimental aptitude tests. Psychometrika, 1952, 17, 


48-08. 
6. Thurstone, L. L. The reliability and validity of tests. Ann Arbor, Mich.: Edwards 


Brothers, 1931. 


Manuscript received 1/11/52 


Revised manuscript received 3/7/58 


— C m 


PSYCHOMETRIKA—VOL. 18, No. 4 
DECEMBER, 1953 


WHO BELONGS IN THE FAMILY?* 
Ковевт Г. THORNDIKE 
TEACHERS COLLEGE, COLUMBIA UNIVERSITY 


I was sitting before my ТУ set, а while back, watching Captain Video 
and pondering the organizational problems of psychologists, psychometricians, 
psychodiagnosticians, psycho-somatists, psychosomnabulists, and psycho- 
ceramics (стаск-роћ to you). Wondering what I might do, in my small way, 
to help out, I decided to enlist Captain Video's help to bring me from the 
Black Planet that super-galactian hypermetrician, Dr. Idnozs Heahscror- 
Tenib, cosmos-famous discoverer of Serutan. 

Why delay? The Galaxy was on its way, and in half a light year Dr. Tenib 
was at my side prepared to devote his gargantuan talents to the task. 

Seeing no point in confusing the good doctor by trying to describe to him 
the present administrative hodge-podge, I said, “Doctor, let's start from 
scratch. I want you to find out for me how these good people who are present 
at the annual meeting of the APA structure themselves? What families are 
represented? How many, or better, how few? And who belongs to each?” 

“We proceed," said the Doctor. “Bring sample of population; I measure." 

So we set out to design a sample. The problem presented some interesting 
theoretical aspects, but the final solution was relatively simple. We stationed 
representatives at each of the three state beverage stores and followed every 
third badge-wearing individual who came out of a store. We selected only out- 
going patrons for obvious reasons. 

After assisting each respondent to unburden himself, we brought him to 


` Dr. Idnozs (as we came to call him among ourselves) for study. 


“Now,” murmured the Doctor, “we give tests. First is ‘Draw-a-Psy- 
chiatrist Test.’ ” Sean E " 

“We score this,” he confided, “Бу if it gives horns. 

Presently we started on the physiological test battery. 

“We draw off saliva drop by drop,” explained our idiot savant, “and see 

FREE { = 

does he drool when we bring in Skinner Box. 

Later came the Peculiar Preference Blank. , 

""Forced-choice, you know,” whispered the Doctor. “Would you rather 
make mud pies or kiss gorgeous blonde?” 


*Presidential address to the Psychometric Society, September 7, 1953. 
267 


268 PSYCHOMETRIKA 


“Doctor,” I said, "let's not get personal." 

Time will not permit a full description of the Doctor's ingenious test 
battery. It will be fully elaborated in a forthcoming issue of the Journal of 
Ortho-Personometrics. Needless to say, the tests were all orthogonal, com- 
pletely diagnostic, of highest reliability, and representative of the fundamental 
dimensions of psycho-personality (the personality of psychologists and psy- 
chopaths.) 

I must also skip over with only passing mention the unique procedures 
by which the Doctor established fundamental equal-unit scales for the different 
dimensions included in his battery, and how he provided for equivalence of 
metric from one dimension to another. 

“Ts simple," said the good Doctor. “Таке a number from one to ten. Is 
a score. Single digit. Standardized. When I say one equals one, one equals 
one." 

“What now, Doctor?" I asked. “По we run a Q-type factor analysis to 
locate the dimensions and clusters in our sample?" 

"Is no good," replied my mentor. *Neglects differences in score level. 
Washes out differences in variability. Indicates dimensions, but doesn't 
locate boundary of clusters." 

“Well, then, shall we calculate a multiple discriminant function?” 

“No good. Have no a priori groups. Multiple discriminant only perpetu- 
ates sins of the fathers. (Remind me I tell you sometime about by father.) 
Tells which Division to put man in. Not tell what Divisions should be.” 

"What then?" 

“We run cluster analysis. Find distances between sheep and goats. 
Assign to clusters so that average of distances within cluster is minimum, 
when summed over all clusters. Define families, boundaries, and family 
membership like so." 

And so that is what we did. We had the set of scores for each person. As 
I mentioned before, thanks to Dr. T's skill they had been designed so that 
they were orthogonal measures, so we didn’t have to worry about the effects 
of covariance. And we were also fortunate in that the problems of a metric 
had been worked out for us by the giant brain. It was, therefore, a si 
to express the "distance" between any pair of persons as the squa: 
the sum of squares of the score differences on each one of the tests, The 
problem that remained was merely that of selecting from the N-square matrix 
of between-persons distances k sub-sets chosen in such a way that the average 
of the distances within sub-sets, summed over all k of them, was а шшш 

” + . 

"Have showed you how,” said the Doctor. “Now I 50. Is dinner time on 
Black Planet. 

“But, Doctor,” I expostulated, 
mum Ё sub-sets?” 


mple step 
re root of 


"how do I go about identifying the opti- 


ROBERT L. THORNDIKE 269 


“Ts easy. Finite number of combinations. Only 563 billion billion billion. 


Try all. Keep best.” 
І acknowledged the cogency of his method, then rallied feebly for one last 


question. 
“But, Doctor, how shall I tell how many families there are? How many 


clusters there should be?” 
“15 dinner time. Don't bother me.” And the good Doctor vanished rapidly 


into the stardust of outer space. 


Dr. T had departed, but the problem we had faced together lingered with 
me. 
Suppose we have a set of specimens—of psychologists, of psychopaths, 
of jobs, or whatever. Suppose we have a set of measures of each person, job, 
or the like. Suppose for the moment that questions which may be raised about 
the representativeness of the measures, their independence, their metrics 
have all been satisfactorily answered. Suppose that we have computed a 
scalar distance between each of the specimens in the m-space represented by 
our m measures. Suppose we wish to subdivide our N specimens into k subsets 
in such a way that the subsets shall be as compact and homogeneous as 
Possible. Suppose we define compactness by requiring that the average of all 
the distances between specimens within the same subset shall be a minimum. 
That is, we want the members of each family to be as much alike as possible 
with respect to the set of measures which we have elected to study. How, then, 
Shall we decide upon the value of kK—the number of families or clusters? Is 
there any meaningful way of defining an appropriate, or natural, or “optimum” 
number of clusters? And once k has been determined, how shall we decide 
upon the boundaries and the centroids of the various clusters? How shall we 
tell where one should end and the next begin? Who belongs in which family? 

These appear to be genuine problems, with real meaning in a number of 
Practical contexts. Some solution must be arrived at by the dress designer 
engaged in manufacturing clothes, who must decide on the number of different 
Sizes for women’s clothes and the dimensions for each. Some solution must be 
reached by the military personnel specialist who must identify groups and 
families of jobs in the military services in planning testing batteries, classifica- 
tion Systems, and career guidance programs. А solution is implied in the work 
of those sociologists who undertake to identify the class structure of a com- 
munity and delimit the class membership of individuals. 

Let us start with the second problem first, because it looks somewhat more 
docile and amenable to attack. The problem is: For a given value of k, how 
Shall we assign N specimens to k categories so that the average of the within- 
categories distances will be a minimum? So that there will be as much likeness 
Within families and as much difference between families as possible? 


270 РБУСНОМЕТЕТКА | 


Dr. T. has already given us the simon-pure mathematician’s answer. Тће 
number of combinations is finite. Try them all and pick the best. But that 
solution is not very comforting. Though finite to the mathematician, the 
number of combinations is without limit for the man who must work with the 
data. With only 10 specimens and two clusters, the number of possible com- 
binations is over a thousand, and the number increases at a rapidly accelerat- 
ing rate with increase in either N or k. | 

The mathematicians in my family also assure me that there is no analytic 
mathematical solution to this problem. We appear to be thrown back on 
iterative approximation procedures. . 

The exploratory work we have done suggests that such procedures can be 
developed in a form which is not too laborious, and which converges relatively 
promptly to a stable solution. From here on in, I would like to illustrate the 
process with a miniature set of data from analyses which we have been doing ^l 
with a view to defining more rationally the family relationship of Air Force 
jobs. These particular data have a number of shortcomings, so no particular 
weight should be attached to the substantive results. 

Тће basic data consist of the ratings of each of 12 Air Force job categories 
with respect to 19 dimensions. The dimensions were selected on the basis of à 
rather extensive correlational analysis of 130 attributes which have been 
applied to jobs in job descriptions and elsewhere. The 19 dimensions were 
chosen as being to a large extent mutually independent, fairly reliably rated, 
significant for a number of Air Force jobs, and differentially significant for 
different jobs. 

The average rating of each job on a scale from () to 9 is shown in Table 1. 
Ratings of the jobs were made by four or more supervisory non-coms. The 
inter-job distances are presented in Table 2. We report here only one particular 
case—that of three families for the set of 12 jobs. у 
Our procedure is to assume that the two jobs which are at the greatest 
distance from one another will axiomatically fall in different families. The ' 
third cluster starts with the job which is least near to either of the oer two. | 
Each cluster is built up by adding on that specimen which is nearest to the one 
which initially defined the cluster. A specimen is added to each Кунти in turn 
and the cycle is repeated until all specimens are assigned. We then hav i et 
of initial clusters of equal size, and we can determine for each s ise its 
average distance from the members of its own cluster and of th. [ш i 
This situation is shown in Table 3. е other clusters. 
Generally speaking, a specimen is mis-assigned if it į 
bers of another cluster than to the members d its phe id н mens 
illustrated by the job of General Instructor. Cases of this sort ud vic m а, 
one at а time, starting with the most obvious misfits, and the average distances 
are recomputed after each assignment. (This is actually a good deal less 
laborious than it sounds.) Shifts are made until there is no fürthersiift which 


assigned, 


271 


ROBERT L. THORNDIKE 


TABLE 1 
Average Ratings of 12 Air Force Specialties 


on Requirement of 19 Attributes 


Air Force Specialty 


ueurootroq 
лү—08196 
081184100) 
[vorpo]y—06c06 
XIe[D) [вә 

3 3э8рия— 09118 
1039113591 
15909) — 09292, 
351910995 
[9000519 — 09222, 
"2946 ээперию) 
199180 —09184 
21910 — 09202, 
чтошцоој, A[ddng 
штајолјод —0сер9 
uvioruQ99,, 
4jddng—oeri9 
4000—08229 
отивцоәудү 
зуволу—тетер 
оше®цдәрү 
отре+—06106 


Attribute 


зе oo 
oon ~ 00 
on no 
о: св ~ 
0:913" 3c 
-© © о о 
оч wo 
ян юг 
му tO 
оч вы 
Юю сео 
чо go 
чо чо 
чч вы 
юю oo 
то юм 
чю d. 
Ho ш 
"o я 
A 0 A 0 
чо ос 
"do < г 
Se ыы 
12 0 > ~ 
e = 
5 3 3 
> 
apik 
Вак 5 
55288 
nee X 
“ас о = 


| 
| 


5 Manipulative 


8.5 5.4 4.5 4.4 5.0 6.8 4.8 6.4 1.8 6.2 5.0 2.8 


Ability 
6 Responsibility for 


оса 


Work of Others 
7 Emotional Control 


8: Speeds ле Eo 
9 Foot-Hand 


1.8 3.6 3.0 2.8 7.5 0.4 0.2 0.1 2.8 0.2 3.5 5.4 


Coordination . 


10 Work under 


Dangerous 


oo 
"co 


Conditions 
11 Clerical Perception 


12 Concentration amid 


ле оою шш 
NO 1209 MO 
са CHO oiu 
юю MMO WN 
NO cao oa 
~: MOK се 
MS Ra саа Тео 
Or ачы ча 
NAA IS eg У doo esi 
то HOS чы 
DO NVO ча 
CO о сем вс 
eu СОСО © 
һо NOW чы 
за) NOO ш 
tein) HON ws 
1990 eee) За 
Co NMO чы 
wo HIND oc 
OO ANH со 
оч AMM NO 
о о ююю юс 
чою ONN NO 
һо OHO чы 
a ро 
. 5% DF 
ERR BE 
HE NN 
E 7H: $38 
ы з 
= SET o8 RA 
ван зы 
BSRbOSSARXE 
855898866-35 
A S25A S538 а58 
ваза 29052 
88 wed 23 
ceo ш 
== 35 за 


6.8 5.8 2.8 2.2 6.5 L2 1.8 0.0 0.5 1.2 42 3.8 


Controls . . . 


212 


PS 


YCHOMETRIKA 


TABLE 2 


Inter-Job Distances of 12 Air Force Jobs* 


Air Force Jobs 1 2 3 4 5 6 7 8 9 10 1 12 
1 Radio Mechanic — 62 99 96 104 118 128 137 134 140 84 105 
2 Aircraft Mechanic 62 — 73 75 78 104 109 125 119 130 55 95 
83000. 21:0: ъъ 99 73 — 51 95 84 90 95 83 118 50 84 
4 Supply Technieian 96 75 51 — 99 64 67 77 67 $89 60 75 
5 Petroleum Supply 
Technician . . 104 78 95 99 — 149 153 166 142 167 64 79 
бе Е: sie ox + 118 104 84 64 149 — 35 28 83 60 101 125 
7 Career Guidance 
Specialist . . 128 109 90 67 153 35 — 41 80 58 109 129 
8 Personnel Specialist 137 125 95 77 166 28 41 — 80 57 119 141 
9 General Instructor 134 119 83 67 142 83 80 80 — 100 108 93 
10 Budget & Fiscal 
Clerk .... 140 130 118 89 167 60 58 57 100 — 132 145 
11 Medical Corpsman 84 55 50 60 64 101 109 119 108 132 — 76 
12 Air Policeman . 105 95 84 75 79 125 129 141 93 145 76 — 
*Multiplied by 10 to remove decimal. 
TABLE 3 


Initial Grouping Into Three Clusters, Showing Cluster Membership and 
Average Distance of Each Job from Jobs in Each Cluster 


Clusters 
Job 
A B c 
Jobs1,2,4,9  Jobs3,5,11,12 Jobs 6, 7, 8, 10 

1 Radio Mechanic ...... 97* 98 131 
2 Aircraft Mechanic. . . . . . 85* 75 117 
OOO а во а ов тања + 76 76“ 97 
4 Supply Technician ..... 19" 71 74 
5 Petroleum Supply Technician . 106 79* 159 
CCl zov on moe S mos s 92 115 41* 
7 Career Guidance Specialist . . 96 120 45* 
8 Personnel Specialist . . . . . 105 130 42* 
9 General Instructor ..... 107* 106 86 
10 Budget & Fiscal Clerk . . . . 115 140 59* 
11 Medical Corpsman ..... 72 63* 115 
12 Air Policeman ....... 97 80* 135 


*Asterisk indicates cluster to which each job is assigned. 


ROBERT L. THORNDIKE А 273 


will reduce the average of all the within-cluster distances. This is the situation 
Which we find in Table 4. This appears to be a uniquely best assignment of the 
12 jobs to three families, in the sense that we have defined best. 


TABLE 4 
Final Grouping into Three Clusters 
Clusters 
Job 
A B с 
Jobs 1, 2, 5 Jobs 3, 4, 11, 12 Jobs 6, 7, 8, 9, 10 
1 Radio Mechanic ...... 83* 96 181 
2 Aircraft Mechanic. . . . . . 70“ 74 117 
е x 22k Pe Re ee 89 62* 94 
4 Supply Technician . .... 90 62* 78 
5 Petroleum Supply Technician . 91*t 84 155 
BUDE S uus NM S AME 124 94 52* 
7 Career Guidance Specialist . . 130 99 54* 
8 Personnel Specialist . . . . - 143 108 52* 
9 General Instructor ..... 132 88 86* 
10 Budget & Fiscal Clerk . . . . 146 121 69* 
11 Medical Corpsman . . . . - 68 62* 114 
93 78* 127 


12 Air Policeman ....... 


*Asterisk indicates cluster to which each job is assigned. 
Hob 5 (Petroleum Supply Technician) is assigned to Cluster А rather than Cluster B because, due to the 
small size of the cluster, it has less effect on the over-all average distance in that cluster than it would in the 


larger Cluster B. 


The nature of a given family can best be defined by computing the cen- 
troid of the jobs which make up the family. These centroids are shown in 
Table 5. Thus, Family A is made up of jobs which call for relatively high 
amounts of familiarity with tools, manipulative ability, spatial judgment, and 
facility in manipulating multiple controls. Family B, by contrast, emphasizes 
social adaptability and ability to take responsibility for the work of others. 
Family C is the one that is highest on clerical perception, arithmetic computa- 
tion, and fluency of expression, and is very low on strength, coordination, and 
the like. Factors which do not serve to differentiate any of the families to an 
appreciable extent are accuracy, emotional control, speed, concentration amid 
distractions, induction, and flexibility. The dimensions which differentiate 
between the clusters provide initial hypotheses as to dimensions important for 
а personnel classification program, and the extent to which a given factor 
differentiates is a cue to its significance for such а program. Е 

The approximation procedure for arriving at the optimum definition of 
Clusters for a specified value of Ё, the number of clusters, seems moderately 
Satisfying. Now we must face the much nastier problem of determining the 


274 PSYCHOMETRIKA 


appropriate value for k. Into how many families should the specimens be 
grouped? 


Obviously, every increase in the number of families results in some reduc- 
tion in the average distance within families, just as every addition of a variable 


TABLE 5 
Average Weights for Three Job Clusters on Each of 19 Attributes 


Clusters 
Attribute Е. 


> 
w 
Q 


. Emotional control 
. Speed 


Ф Фф л Ф ел л Ф 1 ©2 л нь с› с› с› с› 1+ о л 
па шо Ro 00 00 IA tO © wh о к Co ж CO CO CA 
€) Ф Оз N ©л ©л ©з o мм л n 
Loonocooonwooonvoonuo 
опстеоволл»=-осомлмолљлалом 
о © © ж аю о © © бо о оо бою о о ою њ 


ЗОНЕ о обо; сл оюн 
© 
o 
ЕБ 
B 
о 
2 
B 
5 
Я 
S 
o 
в 
p 
ЕА 
= 
а: 
Е 
El 
3 
8 
о 
Pa 
о 
Б] 
В 


to a multiple regression equation results in some further increase in the value 
of the multiple correlation. The more pieces into which we chop our m-space, 
the shorter the distances within each. How are we to decide when to 
Here I must admit that I am stumped. 

As I have indicated, with every increase in k there will be a decrease in 
the average within-cluster distance (which we may call 4). The manner in 
which the distance decreases for our illustrative example is shown in Figure 1. 
Ideally, one would like some type of significance test of the change in A as k 
increases from 2 to 3 to 4, and so on. But I am unable to produce such a test. 
Furthermore, I suspect that if one could be developed it would involve an 
assumption of normality of the distributions of the Specimens in the various 
dimensions. This assumption is in fairly direct conflict with the notion of 
families, or clusters, or types. In the one case, we assume continuous unimodal 
distributions. In the other case, we are interested in foci, in more dense con- 


Stop? 


ROBERT L. THÓRXDIKE 275 


centrations of specimens in certain limited regions. It is when such concentra- 
tions exist that a distinctively “best” set of families will be found. 

One might examine the drop in A with the increase in Ё, using а diagram 
Such as Figure 1. Intuitively, it seems that a sudden marked flattening of the 
curve at any point should identify a distinctively "right" value of k. That is, 


Average Within-Cluster Distance 


Number of Clusters 
FIGURE 1 

ter Distance for Different Numbers 

n distances for 12 Air Force jobs) 


Average Within-Clus' 
of Clusters (Based о 


number of families uniquely corresponds 
to the configuration of points, since there is relatively little gain from further 
increase in the number of clusters. I have tried to test this out empirically, 
using synthetic data. That is, I have built up sets of points which were dis- 
tributed around a known number of specific foci, with random variation away 
from these foci introduced, and then determined the clusters for successively 
larger values of k. The results for three examples are shown in Figure 2. The 
curves do not provide much support for the intuitive specification of the 


this should be a point at which the 


,"'umber of clusters. 


276 PSXQHOMETRIKA 


Finally, one might specify the number of clusters simply by administra- 
tive fiat, in terms of purely practical considerations. Thus, one might decide 
that practical limitations in maintaining records, scoring tests, making assign- 
ments and the like limit one to no more than six different appraisals of the 
individual, and rule that the number of appraisals shall be six. One would then 
set out to delimit six clusters in such a way that within the six a maximum of 


+ ~ 


"а 
e 
= S 
А к 
= 


40 


30 


CR 
M 


Р 
20 m 


Average Within-Cluster Distance 


(critical point) 


Number of Clusters 
FIGURE 2 


Average Within-Cluster Distance for Different Numbers of Clusters 
(Data for three synthetic examples built around four foci) 


compactness resulted. (The correlative result is that there is a maximum of 
variance between the centroids of the clusters.) One might then apply multiple 
discriminant analysis to one's test battery to find test weights which would 
maximally differentiate the clusters. 


At this point I can sense the bubbling up of doubts 
what about your units?” .. . “How can you decide what 
. .. “What about the error variance in the location of anys 
“What has all this got to do with the organization of p 
tions?" 

I can do no better than emulate the good Dr. Тешь. Is time to go home. 
Sleep on question. Maybe tomorrow you give me answers. 


and questions: “But 
dimensions to use?” 
ingle specimen?" . . . 
sychological associa- 


| 
| 


PSYCHOMETRIKA—VOL, 18, No. 4 
DECEMBER, 1953 


IMAGE THEORY FOR THE STRUCTURE OF QUANTITATIVE 
VARIATES* 


Louis GUTTMAN 
THE ISRAEL INSTITUTE OF APPLIED SOCIAL RESEARCH 


A universe of infinitely many quantitative variables is considered, 
from which a sample of n variables is arbitrarily selected. Only linear least- 
Squares regressions are considered, based on an infinitely large population of in- 
dividuals or respondents. In the sample of variables, the predicted value of a 
variable z from the remaining n — 1 variables is called the partial image of z, 


1. The Multiple-Correlation Approach to the Notion of “Commonness” 


There are two ways in which it is conventional to try to explain “why” 
statistical variables are intercorrelated. One is based on multiple correlation 
and the other on partia! correlation. 

The partial-correlation approach has been utilized to develop a theory 
to explain all intercorrelations simultaneously within a set of variates, 
namely, the theory of common factors. Spearman’s celebrated hypothesis 
was that mental tests were intercorrelated because they had a single general 
factor in common; if this factor were partialed out, no correlations would 
remain. The generalization to multiple common factors by Spearman, 
Thurstone, and others remains a partial-correlation approach. If m variables 
can be found such that when they are partialed out the observed inter-test 
Correlations vanish, then these m variables are said to constitute a set of 
m common factors, and to represent what the original tests have in common 
4). 

"i Common-factor theory is still beset with several different kinds of 
common fecha mande оце ог redirection. Nodular theory ene ило) „депе гайын 


a - itati d to data with curvilinear regressions (6). Order-factor theory 
обе m еы чүш among the observed variables and of separable factors (7). 


Й i t also to the other two. | 9 
he ое рыну Е Е results published since this paper was written: 


А ches to factor analysis," Annual Technical Report on 
cone, Quinn, Tire пон approach research was aided by an uncommitted grant-in-aid 
from the Ford Foundation. 

277 


278 .  PSYCHOMETRIKA 


problems of indeterminacy (among them the problems of communalities, 
of rotation of axes, and of estimating factor scores) arising from the fact 
that the m variables to be partialed out are hypothetical in the first instance. 
Many controversies exist as to how to make these variables concrete, and 
many scientists are sceptical of the validity of the basic premises. 

It is interesting that hitherto only the partial-correlation approach— 
using controversial hypothetical variables—has been used for a structural 
analysis of a set of variates, despite the fact that the more concrete notions 
involved in the multiple-correlation approach seem older and more widely 
accepted. Apparently no systematic attempt has been made previously to 
capitalize on the structural possibilities of the multiple-correlation approach. 
Such an investigation is the purpose of the present paper. 

We shall show how the intercorrelations within a set of variables all 
can be simultaneously interpreted or explained by means of their mutual 
multiple regressions, the latter determining, in a certain unambiguous manner, 
what the observations have in common. 

We treat here the case of quantitative variables with linear least-squares 
regressions. Elsewhere we shall treat qualitative cases [as in (5)]. 

For the multiple-correlation approach, we need introduce no hypothetical 
variables. If we are given a set of n observed variables 2, , we can consider 
directly the multiple regression of each variable on all the remaining n — 1 
variables. If r; is the resulting coefficient of multiple correlation for x; , 
then traditionally 7; has been called the “index* of determination” of Ti 
from the remaining variables, and V1 — т? the “coefficient of alienation.” 

Indeed, 75 represents the proportion of the total variance of 2, that is 
dependent on the remaining variables, and in a real sense expresses how 
much 2, has in common with other variables. If 7? = 0, then =; has nothing 
in common with the other variables; in fact, it also then correlates zero 
with each separately. If r; = 1, then z; is linearly dependent on the remaining 
variables, so that whatever could be done with x; could be done as well 
without it; the remaining variables contain all the relevant information for 
any problem. Values of r5 intermediate between zero and unity, then, express 
intermediate degrees of commonness between =; and all the 
variables. 

This can be seen further by studying the classical normal equations from 
which one computes the multiple regression coefficients. According to these 
equations, we break 2, up into two parts, say p, and е; , Where p; is the 
predicted value and e; is the error of prediction: : 


remaining 


t= pte. 


*Although there is no standard usage in the literature we shall i 
m Ер = А tematically use the 
гога "index" to refer to the square of a correlati р Systema] y 
yo Ko e E E. ion coefficient, to distinguish the square 


LOUIS GUTTMAN 279 


The equations state in particular that e; correlates zero with p; : the prediction 
and the error of prediction are uncorrelated. But more important for us, the 
equations state more generally that e; correlates zero with each predictor 


t, separately, or 
та 70; (77 0). 


"Thus 2, is broken up into two parts. One part, p; , is perfectly dependent 
on the remaining n — 1 variables; the other part, e; , correlates zero with 
each and every one of the » — 1 predictors, and hence with any possible 
(linear) combination of them. The multiple correlation of p; on the predictors 
is perfect; the multiple correlation of e; on the predictors is zero. 

Multiple correlation gives us this simple and profound property of break- 
ing each variable into two parts, one of which is determined entirely by the 
remaining variables, and the other of which has no relation with the remaining 
variables. 

Тће study of the common and alien parts of the observed variates, 
as defined by multiple correlation, we propose to call image analysis, a name 
suggested by the n-dimensional geometry of the situation (11). 

Paradoxically, the alien parts can play a role in the observed inter-test 
correlations, which is one of the major points analyzed in the present paper, 
especially in $8 below. Indeed, in a sense, the “alien” parts are more basic 
than tlie common" parts, as shown in the final $11 below. 


„ 2. Relationship to Common-Factor Theory 


It is of interest to inquire as to what relationship image analysis has to 
common-factor analysis in the Spearman-Thurstone sense. It turns out 
that image analysis is the more basic and inclusive approach; it includes 
common-factor theory as a special case. That this might be so could possibly 
be surmised by considering the respective properties of partial correlation and 
multiple correlation. A partial correlation coefficient in general can either 
increase or decrease as the number of variables eliminated increases; but 
common-factor theory is concerned only with à special kind of circumstance 
wherein partial correlations tend only to zero. On the other hand, a multiple 
correlation can never decrease as the number of predictors increases; in 
general, the correlation increases. This nondecreasing property is all that is 
required by image theory; so no restrictions at all are involved, and the theory 
is universally appropriate. 

Because of its universality, image theory throws considerable light on 
common-factor theory, as well as on order-factor theory (7) and on any 
other special theory. It shows under what special circumstances a universe 
of data admits of a common-factor structure at all, regardless of the number 
of common factors. This we shall see in the present paper. In a later paper, 
We shall see how image theory explains why the problem of communalities 


280 PSYCHOMETRIKA 


has not been solved yet in the Spearman-Thurstone theory, and how а 
universal solution is impossible; it will also be shown there how misleading 
present computing routines can sometimes be that are based on "extracting" 
common factors (10). 

À new, universal computing routine will be suggested that will help 
distinguish for a given set of data whether they have a finite common-factor 
Structure (no matter what the finite number of common factors may be), 


an order-factor structure (simplex, circumplex, etc.), or some other kind of 
Structure. 


3. The Observed Correlation Matrix 


We are concerned with a universe of content* of indefinitely many quanti- 
tative variables, which is defined in advance of any statistical analysis. 
We assume that each variable has a finite population variance, and that all 
regressions are linear.} 
It is essential to distinguish between the universe and any finite samples 
variables that may be selected from it. We assume nothing about random 
sampling of variables but that we can arrange the universe in an arbitrary 
order in which our particular sample will be the first n variables. 

With respect to the population of individuals observed, it too is assumed 
indefinitely large. We are not concerned here with samples of people; through- 
out we treat only population parameters. - 

If x; denotes the jth variable from the universe, then let та be the 
Score of person i on this variable. As usual, we can set the population mean 
of each variable equal to zero, and the variance equal to unity. Thus, 


of n 


Esa =0, EX,Q2e-1 G= 1,2, ++), а) 


where the notation E denotes the expected or mean value over the infinite 


population of individuals. Then the population correlation coefficient, r,, Я 
between any two observed variables is simply their covariance, 
Tj = E TT; (Е = 1,2, .. 2). (2) 
If we are dealing with a finite number of n variables, then the values 
Tj, сап be expressed as a Gramian matrix of order n which we shall denote 
by Ra, 
Ra = ||т |l; фЕ=1,2,-..,7). 
The entries in the main diagonal of R, are each 
(1), indicating the total self-correlations. 


(3) 


unity, according to (2) and 


*A term originating in the context of scale analysis, but aj 
[есш the theorems below do not depend on the tru 
curvilinear. Of course, the meaning of our results is fullest if the tru 
are linear; and even more, if the zero correlations imply comple ea ae d 


te statistical independence. 


LOUIS GUTTMAN 281 


As more variables from the universe are added to the initial set of n 
in (3), nothing happens to the initial entries туь except that more rows and 
columns surround them. An observed correlation coefficient as in (2) between 
any two variables is not a function of n; it does not depend on which other 
variables are in the set. Therefore, if we inquire what happens to R, in the 
limit as n — со, we can state that there always is a limiting matrix, which 
we shall denote by В. , 

Ro = lim R, . (4) 
Е. is an infinite Gramian matrix, and represents the correlations between 
all variables in the infinite universe of content. 


4. The Inverse of the Correlation Matrix and Its Problem of Limits 


The inverse of the observed correlation matrix plays a central role in 
our analysis. We shall usually assume that, for a given set of n variables, 
R, is nonsingular and possesses an inverse. This will be true, for instance, 
if all observed variables are experimentally independent and have retest 
reliabilities less than unity. The assumption of nonsingularity is usually 
correct in practice. 

The inverse of №, will be denoted as usual by R;'. In contrast to the 
elements of №, , the elements of №! are functions of т and change as addi- 
tional variables are added to the set. As is well known, the elements of R7? 
сап be expressed in terms of minor determinants of №, . Let 


A“ = the determinant of R, , 


and let 
Aj? = the cofactor of r;, in R, . 
Then: 
(п) 
в = || 45 || ® 


A clearly varies as n varies; and for fixed j and k, Aj also varies with n. 
The sample of » variables is studied in order to yield inferences about 
the universe of content. We must ask what will happen if we increase the 
Size of the sample. If nothing definite happens in the limit as 2 — с, then 
Surely we cannot infer much about the universe, and any structural theory 
We may have will be unfounded. The differences among finite common- 
factor structures, order-factor structures, and other kinds of structures will 
be seen to depend largely on what happens in the limit of E. asn o, 
First we note that, in dealing with infinite matrices, the algebra of 
finite matrices need not at all hold. If there is a definite limit to R;, then in 
general it is not the same as the inverse of Re ; that is, in general, 
бо zlimRE,, (6) 


noo 


282 PSYCHOMETRIKA 


even if both members exist. Indeed, the right member of (6) may exist and 
hence be uniquely defined; but at the same time Rz' need not exist, or alter- 
natively R=’ may represent more than one matrix. Even if Rz! converges to 
something definite, we have no assurance that there exists a matrix Rz. 
such that R.Rz' = I. . Even if such ап Ёл! exists, it may be only a right 
inverse and not a left inverse, so that Rz'R. зе I. ; or there may be more 
than one such inverse to Rə . These are paradoxes of infinity. For finite 
matrices, right and left inverses always are identical and unique. 

The importance of inequality (6) is illustrated by common-factor theory. 
A number of years ago it was proved that the foundations of the Spearman- 
Thurstone approach rest essentially on the proposition that inequality (6) 
holds; in particular that lim,.. R7' exists and is a diagonal matrix (2) or 
that nondiagonal elements vanish in the limit, 


А 
lim = 
пзе ДО 


= 0; (7 = №). (7) 


Such a diagonal matrix clearly cannot be an inverse for Ra . This hitherto 
little-noticed theorem has most practical consequences, for it provides an 
entirely new way of testing empirical data for the possible existence of common 
actors. Given an observed matrix R, , compute В; ' and see if the nondiagonal 
elements are all close to zero. Such a criterion requires no preliminary deter- 
mination of *communalities" nor "fitting" of factor loadings, nor specification 
of the number of common factors. If the criterion (7) is not satisfied, then it 
is usually futile to attempt to “fit” any common-factor space of finite rank 
to the data. Image theory will enable us to improve on and to generalize this 
criterion, as is shown in §11 below. 

One example where criterion (7) is not satisfied can be shown to be the 
simplex matrix, where the correlations have the law of formation, 


ть = ab, , G < №), 


a; and b; being two certain parameters belonging to z; . It is futile to attempt 
to find any finite number of common factors for such a matrix as n — c. 
Actually, a much simpler theory than that of finite common factors holds (7). 

So much for what is for the moment а digression, to emphasize the 
importance of proving the possible existence of апу quantities we may want to 


hypothesize. It is not to be regarded merely as a matter of mathematical 
pedantry. 


5. Partial Images and Total Images 


A sample of п variables from the universe of c 


: ontent defines а partial 
image for each variable, namely, 


its predieted values from the remaining 


A 


LOUIS GUTTMAN 283 


n — 1 variables. The limit of the partial images of x; as n becomes infinite 
will be called the total image of x, in the universe of content. Let pj? denote 
the predicted value of x;; from the remaining n — 1 variables in the sample, 
and let pí?' denote the limit as n > с: 

(=) 


рі? lim pj?. (8) 


no 


We assume for the moment that the limit in (8) exists; in a later paper (9) 
we shall examine this assumption. Then р}? is the partial image score for 
person 7 for variable x; , and pf; is his total image score for s 

It is well-known how to compute pf? from the observed data. If we 
let wj? denote the weight of x, in the multiple regression for predicting x; , 


then (cf. 1, 306), 


(п) —А? Я 
Wir = ETE (=>), (9) 


provided the denominator on the right does not vanish. Notice that (9) 
does not define a value for j = k, for this would imply a weight for predicting 
the test from itself. It will be convenient, however, apparently to include 
the test itself in its own regression by the artifice of giving it a regression 
weight of zero. So we define the value for j = k as follows: 


wig = 0, (10) 


for all j and n. With this artifice, we can express the partial image scores 
of the x; in the following convenient form, 


(п) 


pi = У шац . (11) 
k=1 
The total image scores of т; are then, from (8) and (11), 


pip = lim 2 [AE - (12) 

In the right member of (12), not only the number of terms being summed 

depends on т, but also the values of the terms themselves, for the regression 
Weights wi? depend on all the variables used as predictors. 

Formula (9) holds for the regression weights, provided the denominator 

Аў? does not vanish. If R, is nonsingular, the denominator can never vanish 

for any j; but if R, is singular, (9) may not hold and the regression weights 

Шау not be uniquely defined. Regardless, however, the partial image scores 

are always uniquely defined, whether А, is singular or nonsingular. This ig 

Well-known, but it may be well to restate here the fundamentals involved. 


This we do in the next section. 


284 PSYCHOMETRIKA 


6. The Fundamental Equation For Least-Squares Images. 


Consider the weights wi? in (11) as unknowns to be solved for—except 


for the self-weights which are always zero as in (10). Let ef? be the errors of 
estimate according to the prediction (11), so that 


Tj = рї? oF en. (13) 
Let the variance of the errors for x; be denoted by о; , 


d, = E eRT. (14) 


Then we wish to determine the ш? so as to minimize (14). Differentiating 
the right member of (14) with respect to the ш, using relations (13) and 
(11), shows that a necessary and sufficient condition for attaining а minimum 
is that the following fundamental equation holds: 


Бе ан =0, (12), (15) 


or that the errors be uncorrelated with each predictor separately. Fundamental 
equation (15) expresses the classical normal equations of linear least squares. 
There is a unique minimum to (14), obtained by unique values of ej? , and 
hence of р. If В, is singular, it may be that the z,; in (11) are linearly 
dependent in such a fashion that more than one set of wj? will yield vhe same 
best prediction pj?, but the best prediction itself is uniquely determined 
regardless. If В, is nonsingular, then also the best wj? are uniquely deter- 
mined by the data, namely by formula (9). 

More generally, then, we can regard (15) as our basic equation for 
determining the w! —uniquely or not. Equation (15) is the basic equation 
of image analysis, from which all other results follow. Together with definitions 
(11) and (13), equation (15) uniquely determines the partial images, ап 
invests them with all their subsequent meaning and properties. 

The errors of prediction from the partial images play a prominent role 
in our theory. We shall call the ef? the partial anti-image scores of x; . Then, 
parallel to (12), the total anti-image scores will be denoted by ef?) and 

(=) (n) 


ер = lim Cji; (16) 


n 


assuming the limit on the right exists. 
One immediate consequence of basie equation (15) is that the partial 
image and anti-image of each z; correlate zero with each other, 


(а), (п) 


E eji Pii = 0. a7 


his well-known result follows by multiplying each member of (11) by 


ef? , taking expectations over $ and using (15) and (10). 


LOUIS GUTTMAN 285 


Let p;, be the multiple correlation coefficient of z; on the remaining 
n — 1 variables. Since the variance of x; is unity, then—as is well-known— 
pin is also the standard deviation of the pj?, or, 


д, = ЕБРР. (18) 


A well-known consequence of (17) is, then, that 
pie Ра = 1 (19) 


It is p;, that has traditionally been called the “index of determination," 
and oj, the “index of alienation.” To avoid possible notions of determination 
in the sense of causation, and to use a more convenient terminology for our 
purposes, we shall call p, the partial norm of x; , and ої, the partial antinorm. 

Geometrically, the n variables x; can be described as unit vectors with 
а common origin, defining an n-dimensional Euclidean space. A correlation 
ту, is the cosine of the angle between т, and x, . The image variable of 2, is 
then represented by the projection of the vector x; on the (n — 1)-dimensional 
space defined by the remaining vectors; and р, is the cosine of the angle 
between =; and its projection, as well as being the length of the projection 
vector. о; is the distance between the termini of the vectors of х; and its 
projection, as well as being the cosine of one of the angles involved. 

It is interesting that this geometry of image theory was known long 
beforé the advent of common-factor theory, which uses a similar geometry 
(cf. 11). 

iow then, is the square of the length of a test vector's projection; 
and an antinorm is the square of the distance between the termini of a vector 
and its projection. Equation (19) expresses the Pythagorean theorem for 
the right triangle defined by the vector of z; and its projection or image. 

The total norm of x; will be defined as the limit of its partial norms, 


and will be denoted by р» ; 


pi, = lim pj, . (20) 
A similar definition holds for the total antinorm, denoted by cj. , 
ci, = lim ej. - (21) 


Obviously, if the limits in (20) and (21) exist, then from (19), 
Pia + оў» = 1. (22) 
That total norms and antinorms always exist is easily established. 
It is well-known that a multiple correlation coefficient cannot decrease as 
"more variables are added to the regression. Therefore, for each j, pi, , is a 


monotonically increasing function of m. Bema bounded above by unity, it 
follows that there must exist a limit to pj, às n — ©, by the usual theorem 


286 PSYCHOMETRIKA 


on bounded monotone functions. Similarly, oj, always has a limit as n — o. 
These results we shall state as: 


THEOREM 1: Total norms and antinorms, pj. and cj. , always exist for 
each variable x; in an infinite universe of content (where each x; has unit variance). 


Further problems of existence of limits—with respect to individual 
image scores and parameters associated with them—will be treated in a 
separate paper (9). 


7. The Fundamental Identity for Least-Squares Images 


The purpose of any structural analysis is to, provide a framework for 
comprehending the interrelationships among observations. Our present 
problem is to "explain" the correlation coefficients r;, . For this purpose, 


image analysis has a universally applicable “explanation,” as stated in the 
following fundamental theorem: 


THEOREM 2: The correlation between any two different observed variables 
(with unit variances) from a given set of n variables is the difference between 
the covariance of their respective partial images and the covariance of their 


respective partial anti-images. That is, if we let gf; and у be the covariance 


between the partial images and. anti-images, respectively , for x; and x, , or, 


(п) 


Шу = E pit pio (23) 
and А Н 
Y? = Есе, (24) 
then the following identity always holds: 
Tik = gie = vie; (7 ғ k). (25) 


To establish the theorem, we first multiply both members of (13) by 
ть: , take expectations over i—remembering (2)—and arrive at 


Tj = E pi? ty F A еа. (26) 

Now the second term on the right vanishes for j " i 
j = К, ace ` 
Hence (26) becomes титан 
Tj = Е рта ; (7 = Ё). (27) 


The left member is symmetric їп 7 апа К. Hence we can 


А даа: rewrite the right 
member by interchanging j and k without altering the гези ; 


lt: 3 


Е 4 
Tj = Е ри 1; (j = k). (28) 


= —————— 


LOUIS GUTTMAN 287 


Equations (28) and (27) state that 2, has the same covariance with 
t, as with the partial image of x, , and vice versa. 

Multiply both members of (13) by р:7, take expectations over 7, and 
use definition (23): 


Бр ан = gf + Врат. (20) 


Multiply both members of (13) by e{?, take expectations over 7, and use 
definition (24): 


Е ёгт: = Ере? + ү}. (80) 


Now the left member of (30) vanishes for j == k, according to (15). 
Hence, from (30), 


TVD = Ере? = Ёрге; (J= h). (81) 


The last member is obtained from the middle member by interchanging sub 
Scripts j and k, which is permissible by virtue of the symmetry of the first 
member. 

Using (31) in (29), and then (29) in (28) establishes that (25) is correct, 
or Theorem 2 holds. 

It should be remarked that there are no assumptions whatsoever in 
establishing equation (25); it is а universal identity. We have not used here 
the assumption that E, is nonsingular, or that the wj? are uniquely defined 
as by (9). Only the basic normal equations (15) have been used, which assure 
unique values for images and anti-images even when R, is singular and (9) 
does not hold. 


8. Interpretation of the Fundamental Identity 


According to identity (25), any correlation coefficient can be regarded 
as the difference between two covariances, one from the common parts of the 
two variables and the other from the alien parts. 

Students of common-factor theory may be puzzled at first by the fact 
that the alien parts should be correlated and affect the observed correlation 
Tir . They are accustomed to the notion of "specific" or “unique” parts 
Which are mutually uncorrelated and do not affect the r;, , (j = k). Тћеу 
may be tempted to take the point of view that the "f? represent “errors of 
fit" of the image covariances gf; to the observed correlations r;, . We shall 
see that such a point of view is correct, provided a finite common-factor space 
really exists (regardless of what dimensionality) for the entire universe as 
п > ©, 

But we shall also see that such а point of view is very specialized. Let 
Us first take the most general view of the situation, and then we shall see 
how various specializations can occur. 


288 PSYCHOMETRIKA 


The fundamental equation (15) which led to the fundamental identity 
(25) states that the anti-image of x; is orthogonal to (uncorrelated with) 
each of the remaining x, . But, paradoxically, this anti-image is not necessarily 
orthogonal to the anti-images of the x, ; ү. is not necessarily zero for any 
pair of subscripts. An anti-image is always orthogonal to a total predictor, but 
not necessarily to parts of that predictor. 

It is indeed peculiar that ef? should always be orthogonal to x, , (j == №), 
but not necessarily to e£" or to рі". It will seem less peculiar if we examine 
the meaning of у? more closely. We shall show now that ү is intimately 
related to the partial correlation between x; and x, , holding constant the re- 
maining n — 2 variables. 

In order to avoid details unnecessary to the main argument, let us 
assume №, to be nonsingular, so that we can use formula (9) for the various 
regression weights, as well as further convenient determinantal formulas. 
Let т}; denote the partial correlation between т, and х, , eliminating the 
^ — 2 remaining variables. This means that =; and т, first are predicted 
separately from the n — 2 remaining variables (where now 2; is not used in 
the regression for x, , nor x, in the regression for x; , so that different weights 
are involved from those of the respective partial-image regressions on n — 1 
variables) and then the resulting errors of prediction are correlated (1) to 
define the partial correlation coefficient т". The well-known determinantal 
formula for this partial correlation is (1) 


т = — ===, (7 # k). . (32) 
МАРА ' | 


Now if we multiply both members of (11) by ei?, take expectations 
over 7, and use (31), (15), and (10), we have 


yt = ш Ее; (jzl. (33) 
But j 
Е tiei? = о, (34) 


as can be seen by multiplying both members of (13) by ef?, taking expecta- 
tions over 7 and using (17). Hence, from (33) and (34), 


D revising the sub- 
scripts, we have 


SY = wien: (sh. (35) 
The determinantal formula for c?, is (1) 
A™ 
2 
Cin = AD . (36) 


"Therefore, using (36) and (9) in (35) we arrive at the 


3 determinantal formula 
for the covariance between any two anti-images, 


LOUIS GUTTMAN 289 


(n) A (п) 
e АЖА 


(7 = k). (37) 
For our purposes, a more striking and important way of expressing у is 
arrived at by using (36) and (32) in (37) to obtain: 

yip = фоны; (F Æ k). (38) 
Identity (38) shows precisely how у is related to 757 . This identity affords 
an explanation for the paradox of possibly correlated alien parts, which we 


shall state here as a theorem. 


TnEonEM 3. If В, is nonsingular, and if р“), is the correlation between 


partial anti-images ej? and ef” , then this anti-image intercorrelation is equal 
to the negative of the corresponding partial-correlation: 


(n) (п). 1 А 
риа = =r; (=k). 
The covariance ү“ vanishes if and only if т. vanishes (j ¥ К). 


This theorem follows directly from (38), by recognizing that pf, = 


#2] пута from the usual product-moment formula for a correlation coeffi- 


cient. 
That р, should be equal and opposite in sign to v? is “obvious” from 
M re Е со + ; 
the geometric pieture involved (ef. 11). т; is the cosine of the angle, say 0, 


between two hyperplanes, while рај, is the cosine of the angle between per- 


pendieulars to these two hyperplanes, or of an angle equal to 1809 — 6. 
Theorem 3 thus boils dawn to be a special version of the trigonometric identity 
that eos (180° — 0) = —соз б. 


According to Theorem 3, after we have subtracted out the common 
part—the partial image—from 2; , the alien part that remains behaves 
toward 2, almost as if x, were not in the regression for predicting x, . Sub- 
tracting out the common-parts of the n variables still leaves room for pairwise 
linkages to remain between them of the kind described by their partial 


correlations. : ; TE 
We can now interpret our fundamental identity (25) by rewriting it, 


using (38), as | 

5 (38), та = 98 + табы 5 (7 = №). (39) 
an observed total correlation r;, can be regarded as arising 
ce between the common parts of the two 
linkage that may remain between the 
2 variables are partialed out. 


According to (39), à 
from two sources: (а) the covarian 
variables, and (b) a special pairwise 
two variables after the remaining ? — 


9. Comparison with Common-Factor Theory 


nkages in identity (39) are of profound importance 
rent patterns of these linkages give rise to different 
For the theory of mental activity, these 


The possible pairwise li 
for structural analysis. Diffe е 
kinds of order-factor theories. 


290 PSYCHOMETRIKA 


linkages make possible some hypotheses as to the physiological workings of 
the nervous system (7). 

The Spearman-Thurstone common-factor theory is a special—indeed, 
degenerate—type which specifies zero pairwise linkages. [More generally, it 
is an orderless theory, which is one reason why the problem of rotation of 
axes arises (7).] 

In common-factor theory, it is hypothesized that each z;; can be ex- 
pressed as the sum of a common part, say c;; , and a unique part, say t; , 


па = Ci F Uji, (40) 
where the rank of c;; is of basic importance. If the rank is m, then there are 


m common factors—expressible with unit variances—say y, (f = 1,2, --- ,m), 
such that 


Cj; = > Giri» (41) 
-1 
where a;, are weights for the common factors. It is assumed that the unique 


parts are orthogonal to the common parts and are also orthogonal to each 
other, 


Buc: = 0, (42) 
and * 
E uius = 0; G # №). (43) 


Hypothesis (42) holds for j = k as well as j ғ k;forj = k it is analogous to 
identity (17). If (42) and (41) hold, it easily follows that each u; is orthogonal 
to each common factor y, separately, which is the more traditional way of 
presenting the hypotheses of common-factor theory. We are not concerned 
here with the y, separately, however, and do not hypothesize anything 
special about them as to whether they are orthogonal to each other or not, 
nor how they are to be located within the common-factor space. It is the 
common-factor space as a whole that is of present concern, and this is repre- 
sented by the c; . The c; are invariant under any nonsingular transformation 
of the y, . 

In particular, the variance o2, of c; is an invariant of the common-factor 
space, and is called the communality of х; (12). Similarly o7, , the variance 


of и; , is an invariant and is called the uniqueness of Ti 
total variance of z; , taken as unity, 


and unique parts: 


{ . Furthermore, the 
is the sum of the variances of its common 


oe, ete 9i; = 1. (44) 
Equation (44) parallels identity (19), with the communality playing the 


role of the partial norm and the uniqueness the role of the partial antinorm. 
A further parallel to an identity of image theory is the equation 


Е па = 0; (15. (45) 


LOUIS GUTTMAN 291 


"That (15) follows from the common-factor hypotheses can be seen by multi- 
plying both members of (40) by шь; , taking expectations over i, using (42) 
and (43), and revising subscripts. Equation (45) is parallel to the fundamental 
equation (15) of image analysis. Indeed, (45) can be used as a starting point 
for the common-factor hypotheses in place of (42). Equation (42) can be 
derived directly from (45) and (44) for j >= К; and И т < n, it can also be 
seen that (42) holds for j = k. Starting with (45) in place of (42) will be а 
convenient way for us to compare common-factor theory with image theory. 

The way common-factor theory "explains" the observed intercorrela- 
tions 7;, is by means of its fundamental factor equation, 


Tjj = E Саб у (7 zn, (46) 


which follows from (40), (42), and (43). Common-factor analysts traditionally 
expand the right member of (46) in terms of some set of y, , using (41), but 
this is irrelevant to the present discussion. 

We сап now summarize and compare the bases and consequences of 
image theory and of common-factor theory as in Table 1. 


TABLE 1 


Comparison of Characteristics of Image Theory and Common-Factor Theory 


Image Theory Common-Factor Theory 
Basic Partition ан = р + ef? Sg = са Фин 
n 
Basic Definition р = Уу об ан 
i-i 


(a): Бе та = 0; (j =Æ k) Била = 0; (jr № 
Basic Restrictions ? i 


b): A ré E uj its = 0; (=k 
(а): a Ws E инбы —0; (jk) 
(D: Eei?pi? = 0 Euge; = 0; (т<) 
(©: Pin F ein 1 са + о = 1 


Consequences 
(п), 


(d): ra = gie — т; GA ra = Been; (js 6) 
JR + т тыты 

(j = k) 
с> == ЕА ае 


= 
©, 
= 
ЕУ 
| 


292 PSYCHOMETRIKA 


10. The Special Case of Determininate Common. Factors 


According to Table 1, common-factor theory lacks a basic definition 
for its common parts c;; , and has two restrictions on the deviant parts 
compared to only one restriction for image theory. The single restriction of 
image theory can always be satisfied, making the basic definition unique, so 
that the consequences are all identities or tautologies—they are universally 
true. In contrast, the restrictions of common-factor theory do not generally 
suffice to define any particular partition of the observations; more than one 
common-factor partition can be found in general—with different w;, and 
c;;—to satisfy the restrictions. For example, if a set of c;; of rank m can be 
found to satisfy the restrictions on the w;; , then certainly a set of rank 
т + 1 can also be found, yielding new w;; which also satisfy the restrictions. 
Or more than one set of с;; can usually be found with the same rank m. 
Two different sets satisfying the restrictions cannot be obtained from each 
other by rotations within one of their common-factor spaces, for any set 
c;; is invariant under rotations. 

This highlights one of the basic problems of indeterminacy of common- 
factor theory. More than one total common-factor space can satisfy the 
same data. (To repeat, this problem of indeterminacy is entirely different 
from that of rotation of axes, which takes place within a given common- 
factor space.) 

This indeterminacy can be removed by introducing the notatiou of a 
determinate common-factor space. Such a space is one in which there is & 
perfect regression for each common factor y, on the observed 2; . For finite 
п, а common-factor space is in general indeterminate; the common-factor 
scores can only be estimated from the x; , with positive variances of errors 
of estimate. As n increases, the errors of estimate decrease; if the limit of 
the errors of estimate is zero, then the common factors are perfectly deter- 
minate in the limit. General conditions under which a common-factor space 
is determinate have been established elsewhere (2); essentially they are 
that there exist a limit to Ю,', which is a diagonal matrix. 

Instead of dealing with separate common-factors y, here, let us define 
the determinateness of a total common-factor space of rank m as follows. 
Let b? be the regression coefficient of а, for predicting с, in the multiple 


regression of с; оп т observed variables. Then the common-factor space of 
the c; will be called determinate if and only if, for all 3*5 


си = lim У, ль. (47) 
noe k=l 


Condition (47) can be seen to be parallel to our definition of total images 
in (12). Н we peel fill in condition (47) as a basic definition in the table of 
the previous section, it easily follows that only basic restriction (a) in the 

*Except possibly for a zero proportion of the population. 


LOUIS GUTTMAN 293 


table is now needed to prove that* 
Cj = р, (48) 


or the common parts of images and of common-factor theory are identical in 
the limit*. For the proof of (48), consider the quantity 63, defined by 


бу = Е (сен — р] = E [ин — ery. (49) 


That the last member equals the middle member follows directly from the 
basic partitions. Expanding the last member shows that 


8, = оу — 2Е шоу + ой. (50) 


To evaluate the middle term on the right, multiply both members of (13) 
by и; and take expectations over i, remembering (11), (10), and (45), whence 


Buje? = E Untu = ош . (51) 
i [i 


That the last member of (51) is equal to the middle member is well-known 
and can be seen by multiplying (40) through by w;; , taking expectations, 
and using (42). Using (51) in (50) shows that 


2 2 Г __ 4 = 
82, = oj. — бај = бе, — Ру (52) 


the last member following from the middle member by virtue of consequences 
(19) and (44). ` к 

Since the first member of (52) is nonnegative, the last member provides 
us with a new proof of a previously known theorem that а communality of 
x; ts always an upper bound to the square of the multiple correlation coefficient 
or partial norm of x; (2, 92-93.) For a given set of n variables, more than one 
set of communalities can exist, but in all cases these communalities cannot 
be less than the corresponding—uniquely defined—partial norms. The 
closer а communality of 2; to the partial norm of x; , then—according to 


(49) and (52)—the closer the c;; to the ps, and the closer the и;; to the 607. 
Now, the c;; апа w;,—hence а 


lso о?, апа сх, 0 not depend on n. It 
is assumed that there is a fixed number т of common factors in the entire 
universe of content, and these will appear in any sample of n variables where 
^ > m. In contrast, the image partition changes with п, with the partial 
norms always increasing (or at worst remaining constant). Our Theorem 1 


&bove states that limiting or total norms always exist. We can now state 
further what the limiting values are 1 
(2) that if (47) holds, then also 

di = оет (53) 


f (47) holds; for it has been shown in 


*Except possibly for a zero proportion of the population. 


294 PSYCHOMETRIKA 


or any communality equals the total norm. Taking the limit in (52) as n — с 
and using (53) shows that 
lim 22, = 0, (54) 


nm 


which then establishes (48). These results can be stated as: 


THEOREM 4: If а common-factor space of rank m is determinate for an 
infinitely large universe of content, then there is no other determinate common- 
factor space possible for the same wniverse—whether of rank m or of any other 
rank. The communalities are uniquely determined and are equal to the correspond- 
ing total norms. The common-factor scores are the total image scores, and the 
unique factor scores are the total anti-image scores. 


We have now completed the demonstration of how common-factor 
theory is a special case of image theory. For a common-factor theory to be 
useful, it should be determinate; otherwise there is no uniquely defined 
common-factor space; and furthermore, common-factor scores cannot be 
estimated closely for use in practice. It the theory is determinate, it becomes 


a special case of image theory, with the special restriction that total anti- 
images are uncorrelated. 


11. A Universal Computing Procedure 


The fundamental identity (25) provides a computing procedure to 
analyze the structure of the interrelationships of » variables from any uni- 
verse of content. Let T, be the Gramian matrix of the covariances у, 
and let S; be the diagonal matrix defined by the partial antinorms. These 
matrices can both be computed easily by first computing „', according 
to (5) and (36). Then, considering also (37), we have the working formula, 


г, = SS, (55) 


or T, is computed from R;* merely by premultiplieation and postmultiplica- 
tion with the diagonal matrix 42 


Once. TAs computed, it is easy to compute G, , the Gramian matrix of 
the covariances g‘; for by referring to (25), we can write 


GR. +r, -2$. (56) 
Only matric addition is needed to com 
matrix 2,8, has been subtracted in the 


consistent; (25) does not define the 
sidered separately. 


pute G, according to (56). The diagonal 
right of (56) to make the main diagonals 
main diagonals, which have to be con- 


i age s rond determinate common factors, the nondiagonal 
elements ої Г, should all be close to zero, for their limi is hy- 
pothesized to be zero. | па у e 


LOUIS GUTTMAN 295 


If not all nondiagonal elements are close to zero, one is led to reject the 
hypothesis that a determinate common-factor space of rank less than m 
holds for the universe. One could then examine Г, to see if any special order 
exists among the nonvanishing pairwise linkages. Tt is best, of course, to have 
a preliminary order theory before one examines the empirical evidence. Exam- 
ples of preliminary hypotheses are the simplex and cireumplex (7). Ultimately, 
special theories may have to be developed for each special kind of content. 

An important paradox is that the anti-image covariances are more 
basic to the structural analysis of R, than are the image covariances. If we 
know Г, , we know R, , for from (55) we have immediately: 


В, = SI Ss, (57) 


This is not the case with С, ; knowing G, is not sufficient for determining R, . 
In this sense, №, is determined by the alien parts, rather than by the common 


parts. 
Another way of stating this paradox is to express G, itself as a function 


of Г, :. From (56) and (57), we have 
G, = SIT, S» + Г, — 285. (58) 


Given Г, , we can compute G, through (58). The converse is not true; G, by 
itself does not determine Г, . In the general case, then, structural theories 
should Бе based on the anti-images, rather than on the images. 

In the later paper, we shall discuss the general theory of the matrices 
of linear least-squares image analysis and show some further intimate con- 
nections between Г, and G, as well as with other matrices that occur naturally 
in the theory (8). 

If а determinate common-factor theory holds, so that Г, tends to a 
diagonal matrix (namely S2), then we have only G, to deal with and no initial 
clues as to its structure. An order-theory may still hold within G, regardless. 
If not, one is up against the problem of rotation of axes that is traditional 
to common-factor theory. But at least the indeterminacies of the com- 
munalities and of the common-factor space have been removed. 

Empirically, it has often been found that multiple correlations tend to 
taper off rapidly as the number of predictors increases. It may often be 


expected in practice, then, that with п greater than 10 or 15, say, partial 
images and norms will be so close to their total images and norms that the 
differences will be negligible, and n сап be regarded аз virtually "infinite." 
Some order structures like the circumplex may require a larger number of 


tests than others in order to piece out the necessary details. 
On the other hand, it is well-known that the sampling errors associated 


with multiple regressions can be quite enormous if the sample of people 
used is small. Only large samples of people can be reliably used if т is sub- 
Stantial, and this is as it ought to be. Any structural analysis of a universe 


296 PSYCHOMETRIKA 


requires a large sample of people. Small samples may be adequate for re- 
jecting null hypotheses of zero relationships, but they are not adequate for 
estimating the details of involved nonzero relationships. In the future, the 
required sampling theory will undoubtedly be fortheoming which will indicate 
whether 500, 3,000, or some larger sample of people is needed in order ade- 


quately to study the structure of a given universe of content on the basis of 
a sample of n variables. 


. Guttman, Louis. The matrices of least-: 
. Guttman, Louis. The existence of total 1 


. Guttman, Louis. А теапа! 
. Jackson, 


REFERENCES 


- Guttman, Louis. A note on the derivation of formulae for multiple and partial correla- 


tion. Ann. math. Statist., 1938, 9, 305-308. 


. Guttman, Louis. Multiple rectilinear prediction and the resolution into components. 


Psychometrika, 1940, 5, 75-99. 


- Guttman, Louis, and Cohen, Jozef. Multiple rectilinear prediction and the resolution 


into components: II. Psychometrika, 1943, 8, 169-183. 


. Guttman, Louis. Review of Thurstone’s Multiple-factor analysis. J. Amer. statist. 


Ass., 1947, 42, 651-656. 


. Guttman, Louis. The Israel alpha technique for scale analysis: a preliminary statement 


(stenciled). The Israel Institute of Applied Social Research, Jerusalem, Israel, 1951. 


. Guttman, Louis. The theory of nodular structures. (In preparation) 
- Guttman, Louis. A new approach to factor analysis: The Radex. In Paul F. Lazarsfeld, 


Mathematical thinking in the social sciences. New York: Columbia Univ. Press, 1953. 

squares image analysis. (In preparation) 
east-squares images and anti-images (In prepa- 

ration) 

ysis of factor analysis. (In preparation) 


Dunham. The trigonometry of correlation. Amer. math. Monthly, 1924, 31, 
275-280. 


- Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 


PSYCHOMETRIKA—VOL. 18, No. 3 
DECEMBER, 1953 


A GENERALIZATION OF THURSTONE’S LEARNING FUNCTION* 


HAROLD GULLIKSEN 


PRINCETON UNIVERSITY 
AND 
EDUCATIONAL TESTING SERVICE 


Thurstone’s equation giving the probability of a correct response (p) as a 
function of practice time (£) when punishment and reward have equal effects 
has been generalized to the case where the effect of punishment is not neces- 
sarily equal to the effect of reward. Since the general equation is somewhat 
unwieldy, three special cases are considered, where reward has no effect, where 
punishment has no effect, and where these effects are equal. Equations are 
given together with tables for making a rectified plot for each of the three 
special cases. 


This paper presents a generalization of the learning function developed 
by Thurstone (2) and will discuss certain interesting special cases of this 
general equation. 


Definitions and Assumptions 
Following Thurstone’s development we will define the following variables: 


the strength of the correct response, or the number of unit correct 

responses available to the organism at any moment. 

e = the strength of the incorrect response, or the number of unit incor- 
rect responses available to the organism at any moment. 

D = the probability of a correct response. 

а = the probability of an incorrect response. 

t = practice time. 


8 


The relationship between p, 4, s, e, and ¢ is assumed to be given by the 
following equations: 


8 
Pe ste а) 
апа 
TUN 
= ate (2) 


*This study was supported in part by contract N6onr 270-20 between the Office of 
Naval Research and Princeton University. The opinions expressed are, of course, those of 
the author and do not represent attitudes or policies of the Office of Naval Research. 


297 


298 PSYCHOMETRIKA 


These two assumptions are identical with assumptions [1] and [2] from 


Thurstone (2). It is also assumed that the variation of s and e with respect 
to time is given by 


ds 3 

а kp, ® 
and 

de 

eee 4 

di cq. (4) 


These two assumptions are similar to Thurstone’s assumptions [4] and 
[6] but are more general because it is not assumed that с = k. In Thurstone's 
development it was assumed that the effect of rewarding the correct response 
(®) was equal to the effect of punishing the incorrect response (c). Here the 
more general case in which these parameters may be either the same or 
different is being considered. 

From the foregoing set of four equations it is possible to derive the 
functional relationship between the probability of a correct response (p) 
and the practice time (6). It might be noted that the assumptions used here 
are essentially the same as those used in a former paper (1) except that 
there the functional relationship between cumulative errors and cumulative 
correct responses was obtained, while here the interest is in the relationship 
between two other variables, practice time (0) and the probability of a zorrect 
response (p). 

Derivation of the General Cave 
Substituting (2) in (4) gives 


di ste’ (8) 


If s is expressed as a function of e, and this function substituted in 
(5), (5) will then be a differential equation in e and £ only and may be inte- 


grated readily. We can obtain заза function of e by the following procedure: 
Dividing (3) by (4) gives 


ds _ kp 
а са (© 
Dividing (1) by (2) gives 
Dp-s 
"wd (7) 
Substituting (7) in (6) we have 
ds Ыз 
e f e 


"ct 


HAROLD GULLIKSEN 299 


Integrating (8) and evaluating the constant of integration by noting that 
when? = 0,8 = s , ande = во , We obtain 


$ k e 
lo, a log z (9) 
Taking antilogarithms of both sides and solving for s gives 
e -k/c 
8 = (2) Е (10) 


(2 
Equation (10) expresses s as a function of е and certain constants. Substituting 
(10) in (5) and separating variables gives 


Eo 00-е de + де =e dt. (11) 


~k; 
е5 /e 


Integrating (11) and evaluating the constant of integration by setting e — £o 
when # = 0 gives | 


C So уе _ _ 2:09 
е Бате = сі + ео y 90s (12) 


Equation (12) gives e as a function of #. In order to obtain the functional 
relationship between р and & it is necessary to obtain e as a function of p 
and substitute in (12). This can be done as follows: Substituting (10) in 
(7) and solving explicitly for e gives 

А vla e/ Ck) 

ee РАДА (2) А (13) 
Substituting (13) in (12) and rearranging terms gives the final solution as 
follows: 


c atem A (g^ 
k (e p 


с/(е%Е) k/(ck) 
—k/ CR) —е/(е € (Po 
= pee; Pra — (9) Е 5 (2) й (14) 


where 

p = the probability of а correct response, 
ф = Ш, 

k = the effect of reward, 
с = the effect of punishment, 

Po = the value of p when ¢ = 0, 

„Ф = 1 – ро, 
€, = the value of e when # = 0, and 
So = the value of s when ¢ = 0. 


300 PSYCHOMETRIKA 


Differentiating (14) we have 


(ја [9 (05594 07 09] 


== (о a аі. 


Solving explicitly for the derivative and simplifying gives 
d: p є/(с+®) q k/Gc tk) 
dp (+ 0 ро (2) (2) у (15) 


бо 


which is always positive and approaches 0 as either p or q approaches 0. 
The second derivative is 


Фр p 2c/(e * k) 2k/(ck) 
пр“ ct 1024 (2) (2) [2e + k — 36 + др]. @® 
So £o 
As in the case of the first derivative, the second approaches 0 when p ap- 
proaches 0 or when q approaches 0. The inflection point is given by the 
remaining solution, 
_ 2 +Е 7 

P = Bc + 3k ~ 

It can be seen that as c and k take different values between 0 and plus 
infinity, the inflection point shifts from p — 1/3, when c — 0, to p — 2/8 
when k = 0. The inflection point is at p = 1/2 when с = k. It can ve seen 
that as long as c and k both remain positive, the inflection point can never 
be lower than p = 1/3 or higher than p = 2/3. Assuming that c and К are 
never negative is equivalent to assuming that reward never decreases the 
strength of the correct response and that punishment never increases the 
strength of the incorrect response. 

Equation (14) gives the general relationship between p and ¢ and allows 
the effect of punishment to be different from the effect of reward. It is ап 
equation which is very difficult to fit. Also, the general shape of the curve 
does not alter much for large changes in the values of c and k. As c and k 
vary from 0 to œ the inflection point shifts only from р = 1/3 to p = 2/3 
ав is shown by equation (17). If the inflection point of the empirical curve 
can be rather accurately estimated it might be worth while trying to fit the 


general case. However, three special cases can be handled rather readily; 
and these cases will now be considered. 


Three Special Cases 
Case 1: c = k. The effect of punishment is equal to the effect of reward. 
Case 2:с = 0. Learning is entirely due to reward for the correct response: 


Case 3: k = 0. Learning is due entirely to th i ; 
е ect 
response. y punishment of the incorr 


HAROLD GULLIKSEN 301 
Case 1:c — k. In this case equation (14) becomes 


EUM (18) 


£j = i-a 
1 Ут оу 


E - 


m = 608 , 


(в) ( а)? 

Qo Sý o E 

This is the equation previously developed by Thurstone (2). It can be fitted 
readily in its rectified form using the table of z as given by Thurstone (2), 


which is the same as Table 1*. Taking the first derivative of p with respect 
to # for equation (18) gives 


where 


|| 


20 


а; 2k * 

Еп а on фр, (19) 
which is always positive. That is, the probability of a correct response in- 
creases steadily with practice time. The second derivative of p with respect 
totis ' 


2, 6h? 
А e = - Pv — 2р). (20) 

Setting (20) equal to zero we find the inflection point is 
p = 1/2. (21) 


It can be seen from equation (18) that as p approaches zero, ¢ approaches 
— ә, and as p approaches 1, ¢ approaches + ©. Equation (24) then repre- 
sents a symmetrical curve asymptotic to p = 0 and p = 1 with an inflection 
point at p — 1/2. 

Case 2: с = 0. Learning is due entirely to the effect of rewarding the 
Correct response. For this case equations (1), (2), and (3) remain the same 
аз in the general case, while equation (4) becomes 


a =0 or е=е. (22) 
Substituting equations (1) and (22) in (3) we have 
` ds ks 
JI WP (23) 


"The computing for this and subsequent tables was done by Mrs. Gertrude Diederich. 


PSYCHOMETRIKA 


302 


D Ve 
d d-T\_ т. 
у ет 30d] " z 
те ~ 
6o699'6 11258 ' 9 62016 5 4969" tt 96621" ў 21601 5 16026 E 26960" € 66698: 2  19999' 2 06" 
6ge6v 2o — Gloet' c 21002: 2  6610'2 01096' Т сотб'т — GOLGL'T 19699" t 21085 'Т 00006* 1 0g* 
966г’ t 19766'Т gi£ge't Јела т TLHST°T гстбот ©тд©О'Т 96616" 09526" 1829” ol’ 
*9teg* Шш” go¢eL* 25619" 26829" €ccgG* 6856" бб" ботс" седо" 09" 
8669©” Өтте©` glee’ GLtüo* Toto?" TSO9T" 22021" 90080" T0010" 00000* 05" 
10010" - 90080" - gazt- TSO9T*- тотог'- олђа'- 81282" - gies’ - 8669©*- Gegot - on’ 
GOTS*- Сбт - 65066 - €cegG - 1,68 29' - а66)9- 8088) - ла - чотав'- 19218" - of" 
09626'- 96626"  <T9EO"T- KOO" T- TLySt't- LGLTZ'T- 9182'1- HQISE'T- @бботт-  00006&'l- 02" 
zyogs'T-  *9g699't- SOLGL*T- €grGg't- 0+096"7- 66nL0°e 1008 Glgc€'e- 6896{'с- 29999" ot’ 
CéGog*z- &960°6- 16008"  aHGOL"<- SECT t- wguó9't-  62016'6- — wUGg'9- Srg 6- 00* 
60* 90* Lo’ 90* со” 40° со" 20* то“ 00° 


"(у = 9) рівмәт jo "әрә oY} 03 qenbo st quourysrund jo {дәдә OWL 


*[ 9580 101 (66 = 40}10 = d шош) d jo uoroun 8 58 12 


T Пау 


HAROLD GULLIKSEN - 308 


Integrating (23), setting $ = s; when ¢ = 0, and substituting function p 
from equation (1) for s gives the solution 


РР, By tog (24) 
ео qo до 


ud а I I 


where 


p = the probability of а correct response, 
# = practice time, 

k — the effect of reward, 

eo = the value of e when t = 0, 

Po = the value of p when # = 0, and 

Фф = l— фу» 


This equation is in rectified form since the left side is a function of p 
only and the right side is а linear function of #. It can be fitted very easily 
by making а rectified plot using the values for function p [the left-hand 
side of equation (24)] given in Table 2. 

Equation (24) may also be obtained by setting c — 0 in equation (14). 
Under these conditions equation (14) becomes indeterminate and can be 
evaluated by methods of the calculus to give equation (24). 

The first derivative of equation (24) is 

ар = E pü —2), (25) 
which is always positive, indicating that p increases continually with practice 
time. The second derivative is 


2 2 
ав = Бра - p'a — 3р). (26) 


The second derivative is 0 and the first is positive when 
р = 1/3, (27) 


which is the inflection point of equation (24). It can also be seen that (24) 
is asymptotic to p = 0 and p = 1. That is, according to this equation, when 
learning is dependent on reward only, the inflection point comes at p = 1/3 
and the curve approaches the upper asymptote very slowly. This is equivalent 
to the case for c = 0 previously developed (1, 416), where it is shown that 
the cumulative error curve under these conditions has no upper asymptote. 
That is, the subject’s learning record is approaching zero errors per trial, 


` but does so so slowly that the total number of errors made increases without 


limit. The errors constitute a nonconvergent series when the effect of punish- 
ment is negligible. 


ёїббб'сот әртб8:25  €«460g'G€ Soglt*Le AHHH TS — Tegti'gt опада ST — Geenó'Ct Имен‘ ет  22461'17 об’ 
G9gtgt'ot 9/5@©°6 1 &66°8 &1966' 4. Letot L €2906' 9 96191" 9 16U0'9 Јлет] 6 6299€' 5 08° 
ü 58980 ' 6 21118" 1 HT9SS* f GE6TE' h 19960" 1 21268'< 26969" € 6g6t6' € LICEE €9ogt'€ о” 
& (бас  llglg9'e 91882 81092 өтте 61666 с  абщеге ттге Т0 Lq606'1 09’ 
E g620g't elcoL't €1.09' t 69€16' t 6geet' t Сеет Oglte't 1ссәт'т ©8080'Т 00000' T 05: 
8 110%' 10618" $999." 16169* TSLT9" сис” үсә" 16101" 96056" 02192" Ot 
E £026 оссет` 80660" 19210" - 96080" - стан" - пуста: - grege" - *goce'- egt - oc* 
| 1698" - 15666" -= 91129" - 29169' - 92591 - 68968" - т9606" - 29696' - тт6бо'т-  6e9€U'l- 02’ 
чистг’т-  t996z t-  Ogogé'i-  C)l9tri- | Стоббст-  06469'1- *GTSL*T- 10968'Т-  11196'1-  11980'4- ОТ’ 
€Ltüte'z- 68669: әт ©її©'є- 12896  glóg'e- 89981'- LiG e- тд 20686'1- 00* 
60° 90° Lo* 90° со" го" £0* г0' To" 00" 


304 


(0 = 9) yooyo ou Зита әѕиойѕәл удәллоәш IY} 104 yuourystund 
'osuodso1 4094109 әц 10у рлемәл 0} A[o1rjuo onp sr JuruiuoT 
'5 958) 10] (66" = d 0} 10: = d шолу) d jo uorjoung в se *z 


6 IS VIL 


HAROLD GULLIKSEN 305 


Case 3: Е = 0. Learning is dependent entirely on the effect of punishment. 
For this case, equations (1), (2), and (4) remain the same as for the general 
ease, while equation (3) becomes 


ds 
ak 0, or 5=. (28) 
Substituting equations (2) and (28) in equation (4) gives 
де _ —ce 
di з +е (29) 


Integrating equation (29), evaluating the constant of integration by setting 
е = ey when? = 0, and substituting function q from equation (2) for e gives 
the solution 


END. 1-р, |, Be A, (30) 
о 


ex. 0 
LE Po 


z к= log. 2 


where 


= practice time, 

= the effect of punishment, 

= the initial strength of the correct response, and 
po = the value of p when # = 0. . 


р = the probability of a correct response, 
i 
c 


This equation is in rectified form since the left-hand member is a function 
of p only and the right-hand member is a linear function of t. It can be easily 
fitted by using the values for function p [the left-hand side of equation 
(30)] given in Table 3. By plotting these values of function p against t, 
it ean be seen whether the plot is linear or not, and if it is linear the values 
for the slope and intercept can be determined either by graphically fitting 
a straight line and reading the values irom the graph or by more precise 
methods. It may be noted that equation (30) ean be obtained from equation 
(24) by substituting p for q, q for p, and — for t. It may also be obtained by 
setting Æ = 0 in equation (14), and evaluating the resulting indeterminate 


expression. 
The first derivative of (30) is 
ар _ 2a 
Za P), (31) 


which is always positive, indicating that p increases with practice time. 
. The second derivative of equation (30) is 


2, 2 
aR = Spl – 90 – 3р). (32) 


PSYCHOMETRIKA 


306 


20686" 1 
4TL96°T 
тт6бо'т 
HOSE" 

96056" - 


€gogo't- 
THTTO* z- 
LIENE S- 
LTETL*S- 
тата" zt- 


60* 


THTLg"€ 
10968" T 
29686" 
91682" 
Leton- 


26691 'Т- 
сттат* Z- 
6616" E- 
т6т40*9- 
сеанб" ET- 


go" 


oglte’T- 
ZEHE? Z- 
©©869°©- 
8619%°9- 
опада" от- 


Јо" 


9996т'< 
06269°т 
68958" 
сте 
бст >- 


Gente T- 
STEGE" z- 


'eteóg' <- 


E806" 9- 
Tegtt’ 9т- 


90* 


Ale 


TETé6g*2 
€TQSS*T 
gaol 
86080" 
теЈл9' - 


баг" T- 
91921 Z- 
19860 ' 1- 
2 точ L- 
"6" Te- 


со 


1/189' 2 
CLIgt T 
29169" 
298210" 
т6169°- 


6916 'Т- 
91109'2- 
GCETE" t- 
Ст966'2- 
сода" Lz- 


10" 


гнттс* а 
08085 ' т 
92129" 

gosso- 
69991" - 


€4L09°T- 
Steel’ z- 
1966 '- 
12666 8- 
£1608 SE- 


с: 


[< 
19962" t 
16GGG* 

Gceat' - 
поста" - 


20501" T- 
21919 e 
etg t- 
92626" 6- 
28168" e$- 


ао. 


сата г 
"ele 
16981" 

с0а6т'- 
LLoc6'- 


g6eog"t- 
€6G20' E- 
19980 °S- 
69181 OT- 
21666 * EOT- 


то" 


980" 2 
ба9ст'Т 
ELTH" 
ozt9e' - 
00000" T- 


1606" T- 
£9ogt'- 
62985 S- 
сё/6т°тт- 


00° 


'(0 = 3) 499999 ou Sutavy osuodsor 4991109 ƏY} 10] рлемол 
*osuodso1 40911001 IY} 10] jueurqsrund оў Á[ouo опр st Surureo[ qorqa ur 
‘g 9580 101 (66° = d 03 10: = d шолу) d jo попоптј е se *z 

$ WIgViL 


HAROLD GULLIKSEN 307 
The second derivative is 0 and the first is positive when 
р = 2/8, (38) 


which is the inflection point of equation (30). It can also be seen that (30) 
is asymptotic to p = 0 and p = 1. That is, according to this equation, when 
learning is dependent on punishment only the inflection point comes at 
р = 2/3 and the curve approaches the upper asymptote very rapidly. This 
equation is similar to that given for the case k = 0 (1, 417). 

It might be noted that the equations given here, equation (14), the 
general form, and also the three special cases, equation (18) for the case 
c = k, equation (24) for the case с = 0, and equation (30) for the case Б = 0, 
are identical with those previously derived (1) except for a change in the 
variables. In the former paper the variable (и) representing cumulative 
errors was given as a function of the variable (w), cumulative correct responses. 
'This paper gives the probability of a correct response (p) as a function of 
practice time (t). The equations given in the previous paper can be obtained 
from the corresponding ones given here by substituting и + for t, 
dw/(du + dw) for p, and making the appropriate rearrangement of terms. 

The general equation [20] in the previous paper corresponds to the 
equation (14) in this paper. The case where с = k, [equation (27) in 2] is 
identical with equation (18) given here and similar, except for the change of 
variables, to equation (30) in (3). The case where c — 0, learning by reward 
only, 18 given on page 416 of (1) and in equation (24) here. The case where 
k = 0, learning by punishment only, is given on page 417 of (1) and corresponds 
to equation (30) in the present paper. 


REFERENCES 


1. Gulliksen, H. A rational equation of the learning curve based on Thorndike’s Law of 
Effect. J. gen. Psychol., 1934, 11, 395-434. 

2. Thurstone, L. L. The learning function. J. gen. Psychol., 1930, 3, 469-493. 

3. Thurstone, L. L. The error function in maze learning. J. gen. Psychol., 1933, 9, 288-301. 


Manuscript received 8/26/58 
Revised manuscript received 6/2/53 


ЕА 


PSYCHOMETRIKA—VOL, 18, No. 4 
DECEMBER, 1953 


MAXIMIZING THE DISCRIMINATING POWER OF А 
MULTIPLE-SCORE TEST* 


ЈАМЕ Lorvincer, Сомлме C. GLESER, AND Pamir H. DuBois 
WASHINGTON UNIVERSITY 


Maximizing the discriminating power of a multiple-score test involves 
maximizing the homogeneity of each subtest and minimizing the correlations 
between subtests. A method is presented for constructing such tests from 
items whose intercorrelations are not too high. Under certain restrictions the 
saturation, defined as the proportion of inter-item covariance to total vari- 
ance, is maximized for each subtest. The nucleus of each subtest is three items 
with high covariances inter se. All items which will lower the saturation are dis- 
carded; the one item is added which will maximize the saturation of the result- 
ant test. This process is repeated until all the items are included or discarded 
for that subtest. If the correlation between any such subtests approaches the 
geometric mean of their saturations, their items form a new pool for one or 
more subtests. Formulas are presented for deciding which items to eliminate 
in order to reduce further the correlations between subtests. 


I. Some Theoretical Considerations 


For a heterogeneous group of items it is often desirable to develop 
scoring keys such that each key will constitute a homogeneous subtest and 
the keys in coniunction will provide maximum discrimination, i.e., will be 
minimally intercorrelated: To date, no rigorous method has been available 
which handles these two requirements simultaneously. Factor analysis, a 
possible method, has many drawbacks. Aside from the technical difficulties 
in factoring a large pool of items, a major objection is that the basic assump- 
tion that each item score is the weighted sum of several factors does not fit 
in with the practical problem of assigning an item to a subtest on an all-or- 
none basis. Furthermore, the estimation of communalities in order to deter- 
mine the number of factors to extract is no more rigorous than the procedures 
presented here. 

The aim in constructing homogeneous tests may be expressed as maxi- 
mizing the discriminating power of the test, which has three aspects: fineness 
of discrimination, probability of correct discrimination with respect to 
whatever the test measures, and range of discrimination. 

If one conceives of test construction as adding items one at a time to a 


small nucleus, drawing from a finite pool of items in order of the goodness of 
the goodness of the test can be divided 


: . Permission is gran 
Rm nla imm arie s Ue aes Covenant 


310 PSYCHOMETRIKA 


into three groups. Coefficients which measure primarily the fineness of dis- 
erimination will tend to increase with the addition of items. Coefficients 
which measure primarily the probability of correct discrimination, which is 
either the same as or closely related to the factorial purity of the test, will 
tend to decrease with the addition of items. Coefficients which measure both 
the fineness of discrimination and the probability of correct discrimination 
may increase at first and then decrease. Intuitively, one feels the need for 
such a maximizing function to aid in deciding when to stop adding items. 

Two coefficients previously proposed, Ferguson’s (2) coefficient of test 
discrimination, and Kuder and Richardson’s (4) formula 20 (hereafter 
referred to as KR 20), have this maximizing property. Ferguson’s coefficient 
lacks algebraic properties and has not been related to an explicit system of 
test construction. Our method of test construction is based mainly on the 
saturation coefficient, defined as the ratio of the sum of all the inter-item 
covariances to the total variance of the test. KR 20 is equal to the saturation 
coefficient times n/(n — 1), where n is the number of items. 

The variance of a test may be expanded as a function of the variances 
and covariances of the items: 


n 


Кен Эз +8 >, бы, а) 
i=l i<j=1 

where V; is the variance of item i, У, is the variance of the test, ard C;; 18 

the covariance of item i with item j. The saturation coefficient, S, is 


8-2 ў af (+в È са) 


i<j=1 i<j=1 


- (v.- xv)/v.. (2) 
i=l 
Е Maximizing the saturation of а test drawn from a finite pool of items 
wil not necessarily maximize the discriminating power of the test except 
under certain conditions. One condition is that the intercorrelations of the 
items not be too high. For tests with very high item intercorrelations, maxi- 
и the saturation will definitely not maximize the discriminating power 
| The second restriction is that the original nucleus be more than two 
items, вау, three or four. This insures the test against being too highly di- 
verted in the direction of the unique content of any one or two items. 
The third restriction is that any item excluded from the test at апу 
stage shall not be considered for inclusion at a later Stage. The purpose of 
this restriction is to prevent “functional drift" of the test, that is, inclusion 


of items measuring funetion A, then those measuring functions А and B; 


then those measuring B alone. Apparently this restriction is sufficiently 


LOEVINGER, GLESER, AND DUBOIS 311 


Stringent so that items unrelated to the central factor in the test can scarcely 
be included. Without this restriction items having no relation to the original 
nucleus might in some cases be included. The restriction is not strong enough 
to insure that no group factors exist among the items. The existence of group 
factors will raise the saturation but lower the extent to which the saturation 
(or, more properly, KR 20) can be considered а measure of the proportion 
of first factor variance. 

The discriminating power of a multiple-score test has one more aspect 
than the discriminating power of a single test, i.e., the degree of independence 
of the subtests. In this connection the Jackson and Ferguson (3) derivation 
of KR 20 shows that KR 20 is equal to the correlation between two tests 
which have the same mean inter-item covariance, when the mean covariance 
between two tests is equal to the mean covariance within each. On the basis 
of this relationship, the upper limit of the correlation between two tests 
should be approximately the geometric mean of their saturations. When 
two or more tests are found whose intercorrelations are almost equal to their 
saturations, those tests are considered a new pool of items and subtests are 
again constructed beginning with new nuclei. In the application of the 
Jackson-Ferguson relationship, the difference between KR 20 and the satura- 
tion coefficient is of little importance, as the ratio of the two coefficients is 
almost one, and attention is paid to the order of magnitude rather than the 
exact value of the saturation. 

After the most highly saturated tests are constituted from the several 
pools of items, and after the most highly related tests are reconstituted or 
combined, there remain several possibilities for attenuating the discriminating 
power. An item may have been omitted because it did not fall in the original 
pool of items from which the test was drawn. An item may be included in a 
test even though it is equally or more closely related to another test. The 
discriminating power of the test can be increased by adding some items to 
subtests and dropping others. The aim is to make the intercorrelations low 
rather than exactly zero, since the latter is generally not possible without 

ifici turation. Р 
EU the rigor of the present method of test construction might 
lie in the direction of evaluating the difference between the Proportion of 
first-factor variance and the proportion of common-factor variance. KR 20 
is an upper limit of the former and at least an approximate lower limit of 
the latter. The smaller the difference between the two, the purer the test. 


IL Method 


For the present method items were either given as dichotomous or 
reduced to dichotomous form. There were not many items with very high 
intercorrelations. Since the sampling errors involved were known only roughly, 
а large number of cases was required. The use of exactly 1000 cases saves 


312 PSYCHOMETRIKA 


many hours of labor, since all divisions by N may be accomplished by shifting 
the decimal place. Cross-validation data from one study appear to indicate 
that useful results might be obtained with as few as 300 cases. 

Тће method was originally devised for constructing homogeneous keys 
for а biographical inventory; however, it can be used as well on other types 
of data, such as interest tests ог multiphasic personality tests. The method 
can be used for the discovery of traits or of types of people; there appear to 
be no assumptions which limit it in this respect. 

Ideally the method should be used with the matrix of the covariances of 
every item with every other one. With large pools of items, there are mechani- 
cal complications in obtaining and handling such a matrix. The present 
cycling method was evolved to handle large numbers of items without com- 
puting the complete matrix of covariances. 

The first step is reading the test and formulation of hypotheses as to 
possible interrelations of the items. Items are then grouped according to 
these hypotheses and apparent similarity of content. 


Maximizing the Saturation of a Test 


The procedure for maximizing the saturation of a test is as follows: 
From the matrix of inter-item covariances of a given group of items, the 
triplet of items with highest covariances inter se is chosen as а nucleus. 
These three items comprise a test. All items are discarded from consideration 
which would lower the saturation of that three-item test. The one item is 
added which will maximize the saturation of the resultant four-item test. 
Then all remaining items which would lower the saturation uf that four-item 
test are discarded, and the one is added which will maximize the saturation 
of the resultant five-item test, and so on. The process terminates when all 
items are either included in the test or excluded from the pool. 


In order to maximize the saturation one need only maximize a simpler 
quantity, 


aW. = PH Caf > 7: , (8) 
in which the subseript t on the ratio W means that it is a property of the 
test, and ш prescript п refers to the number of items in the test. The quantity 
nW.: , which might be called the “covariance ratio,” ch i i 

item is added to the test. ме ee 


Тһе proof that maximizing „W, will maximize the saturation is simple. 
The saturation 1s а quantity of the form 2C/(V + 2C), where the capitals 
Бүчү а are used to designate the sums rather than the elements 
of the sum. To maximize the saturation one needs onl inimize i 

| y to minimize its 
reciprocal, (V + 20)/2C = (V/2C) + 1. As constants may be disregarded, 
one needs to minimize V/C, or maximize C/V. 


The next step is to find a criterion for the exclusion of items. Let us 


LOEVINGER, GLESER, AND DUBOIS 313 


define a ratio „W, characterizing each item k not included in the test: 
„У, = 23 e d т, , (4) 
fel 


where the subscript k indicates that the W is a property of the item k, the 
prescript n means that there are т items in the test, k not being one of the 
first n items in the test. It сап be shown that an item k will not lower the 
saturation of the test if 
„У, zJW,. (8) 
'T'he proof of this statement is as follows. One wishes to find the property 
of that item k which, when added to the test, will not lower the W, ratio. 
'This condition may be expressed: 
aW. > ЭЙ’: . 


substituting from equation (3), we have 


(Seat Бо) (+ Еј go xv. 


Since all variances are positive, we may multiply by the denominators 
without changing the sign of the inequality. 


r*(xc-X)xe[ntzxr 
H < i i<j i 
Cancelling like terms and dividing again by the variance terms yields 


Xo nz Бо, Er. 


Thus the inequality expressed in formula (5) is established. The same proof 
may be used to show that item / will lower the saturation if „ИМ, < ,W,. 

Worksheets for constructing tests by the present method are shown 
in Tables 1 and 2, which must be constructed simultaneously. The right 
side of Table 1 consists of a table of covariances for items included in the test. 


TABLE 1 
Synthesis of Test Statistics: A Sample Table 

AW, EV; V: Item 117 110 124 95 
.2447 109 .0850 .0775 .0309 .0482 
.2483 117 .0642 .0418 .0320 

.819 .7113 .2183 110 322 С = .2267 .0395 .0371 
.411 .8237 .1124 124 «ХХ С = .3389 .0197 
522 С = ‚4159 


.447 1.0639 .2402 95 


314 PSYCHOMETRIKA 


After the original nucleus of items is chosen, the first three covariances 
are entered on the right side of Table 1. Their sum is entered in the (3, 3) 
cell of the principal diagonal. The variances of the first three items are entered 
in the first column to the left of the vertical item identification, and the sum 
of the first three variances is entered in the next column leftward. The first 
test covariance ratio is entered in the leftmost column; it is equal to the ratio 
of the sum of the first three covariances to the sum of the first three variances. 

At this stage it is convenient to have Table 2 drawn up but no entries 


TABLE 2 
Pool of Items: A Sample Table 


Item 59 69 70 95 96 124 


Vi .2485 .1957 .2481 .2402 .2491 .1124 
iC .0882 .0645 .0738 .1174 .0767 .1122 


aW; .355 .330 .297 .489 .308 .998 
Trial «We .928 .821 out .362 out 411 
ЕС .0987 ‚0721 ues 97 esee in 
gs .397 ‚068 ansis SOLL зо» | обе 
ош out н їп 


made in it. For each item in turn the quantity aW; is now computed. If the 
covariance ratio for the item exceeds the covariance ratio for the test, then 
the identifying symbol of the item is entered in the first row, its variance is 
entered in the same column, second row, and the sum of its covariances 
with the first three items is entered in the same column, third row. This step 
is completed for the entire original matrix of items. Most of the items will be 
rejected at this step and thus will not appear in either table. 

The next step is to compute a trial ,W, for each item in Table 2. The 
trial „ТУ, is equal to the sum of covariances of the test plus the sum of co- 
variances of the item, divided by the sum of variances for the test plus the 
item variance. The values for the test are found in Table 1, the corresponding 
values for the item are found in the appropriate column of Table 2. 

The item which has the highest trial „ТУ, is selected as the fourth test 
item. Its covariances with the three items already in the test are entered in 
the right side of Table 1, and its variance is entered in the column of Table 1 
labelled V; . The three covariances just entered in the table are now added 
to the previous total, found in cell (3, 3), and the new total is entered in 
cell (4, 4). The new sum of variances is obtained by adding the new variance 
to the previous sum of variances. The new test covariance ratio, JW, › i$ 
obtained by dividing the sum of covariances by the sum of variances. The 
value obtained should check exactly with the corresponding value in the 
“Trial „7 2 row of Table 2. It will be convenient to draw a heavy line down 
the column of Table 2 corresponding to the item selected for the test. 

For each item a new sum of covariances is obtained by adding its co- 


LOEVINGER, GLESER, AND DUBOIS 315 


e fourth item to its previous sum of covariances. The values 
are entered in the row of Table 2 labelled ‚> ‚С. The sum of covariances 
for each item is divided by its variance. These covariance ratios need not be 
where the ratio is less than the test covariance 


recorded, but for those items v 
ratio, an indication must be made that the item no longer is in the pool. 
For those items remaining in the pool, а trial sW, is computed, and so on. 


The possibility exists that when a test has been fully constituted, some 
items added early in the process may have ceased to contribute to the satura- 
tion. In order to test for this possibility, one may compute for each item the 
covariance ratio for that item with the test minus that item. If this ratio is 
less than the final „W. of the test, that item no longer contributes to the 
saturation of the test. The condition for excluding an item which has been 


included in a test may be expressed: 


ару Vu < > Y > Vi. (6) 


variance with th 


The proof is identical with that of formula (5) above. 


Construction of the Multiple-Score Test 


Cycle I keys are evolved from the a priori matrices by the method 
described above. After one key is constructed from a matrix, the entire 
original matrix is utilized in constructing further keys. It was thought de- 
sirable at first to exclude those items in the first key from consideration for 
later keys, bui this course probably is disadvantageous. An item which is 
drawn into the first key as one of the Jast items may more properly appear 
as one of the first items of & second key. It would then probably belong with 
the second key. Items closely related to both keys may often best be omitted 
from both, since they tend e correlation between the two keys. 

All items which are not included in any key are placed in a residual 
matrix. The residual matrix is treated the same way ded the Roe matrices 
are treated; that is, the covariances of all items are obtained an the total 


matrix examined for new keys. The keys derived from the a priori matrices 
plus those derived from the residual matrix now Sonde RS I keys. 
C zi red and correlated. The matrix of intercorrelations of 

‘усе I keys are scored ап Dien bove ar 30) had 


Cycle I keys is examined for high values, Say, Е A 
are the ET which must be reduced in order to have relatively inde- 


pendent tests, and insofar ав possible this reduction must take place without 


impairing the saturation of the tests. 
keys whi 


. If there are two or more 
in magnitude their saturations, all o 


? “From which. Cycle ТА keye ано 09 
Cycle I keys. There may be two or more such groups of closely related keys. 


Each group of keys is, of course; treated separately. Cycle IA keys are con- 
structed by the method used for Cycle I tests. Cycle IA keys are now scored 


to raise th 


ch have correlations inter se approaching 
f the items are placed in а new pool 
tructed to replace the corresponding 


316 PSYCHOMETRIKA 


and correlated with each other and with those of the Cycle I keys retained 
without change. 

The next step is to obtain the point biserial correlation of every key, 
1.е., the Cycle IA keys plus the Cycle I keys that were not replaced in Cycle 
IA, with every item in the original pool. These correlations comprise a matrix 
with one column for each key and one row for each item in the pool. In 
general, it is necessary to apply a correction to the point biserial between 
the item and its own key to compensate for the spurious correlation. In 
practice, for many items the outcome will by inspection either be so high or 


so low that the actual computations need not be made. The formula for this 
correction is as follows: 


TirTüT — Gi 


Tigr-i) = = = zr 
ст — 27;т0;ст + тү 


(7) 


where r;(7., is the corrected point biserial r;p is the uncorrected point 


biserial, and с, and ст represent, standard deviations of item and key. A 
useful approximation is given by: 


© Pir-i) Sri = c/o T (8) 


The matrix of point biserials is utilized to drop items from or add items 
to keys, primarily to lower the correlations between keys, but in some cases 
also to raise the saturation. There are three major considerations in examining 
this matrix. The first is that every item should have its highest correlation 
with its own key. Items with fairly equal correlation with two or more keys 
are often best omitted entirely, since they are the items which raise the cor- 
relations between keys. Occasionally one will find that key A and key B 
will be positively correlated but that item 2 will enter A in a positive sense 
and enter B in a negative sense. In this case inclusion of the item in both 
keys acts to lower the correlation between them. 

The second consideration is that some items not included in any Cycle I 
key may have a high correlation with just one of those keys. This will occur 


- When the complete matrix of covariances is available, the correlations 
etween any key and any other or between a key and any item can quickly 


LOEVINGER, GLESER, AND DUBOIS 317 


be recomputed after each deletion or addition of an item to the key. When the 
complete matrix of covariances is not available, the most practicable pro- 
cedure is to make whichever few changes for each test are most clearly 
indieated. After such changes the new tests are called the Cycle II tests. 
'These tests are scored and correlated, and the biserial correlation of each 
test with each item is again obtained. The same considerations are applied 
to obtain Cycle III tests, and so on. The process terminates automatically 
when there are no further changes. 
'The following formulas are useful in carrying out the cycling process. 
If item 2 is not included in either test T, or test Т. , then adding + to Т, 
will not raise the correlation between T, and T, if 
Tin. rin, + с:/20т,) < Tr, m, - (8) 
If item $ is included in 7, , then the correlation between T, and 7, 
will be lowered by dropping $ from Т, if | 
тт. От, — о+/2етт\у) > Ттт, · (10) 
The test ratio of formula (4) can be obtained from the point biserial 
correlation by means of the following formula: 
„У; = Tis 7/05 . (1) 
Unity must be subtracted from the right-hand side if the item 2 is included 
in theĉtest T. This formula enables one to determine whether a given item 
will lower the saturation of а key when the item was not included in the 


matrix from which the key was drawn. 

When the complete matrix of item covariances is available, it appears 
to be advantageous to begin by picking out all of the nuclei and to construct 
the subtests simultaneously. Each subtest will begin with a nucleus of three 
items, then a fourth will be added to each, then a fifth, and so on. When this 
method is followed, an item which is used in one test is not considered for 


others. Working from the complete matrix of covariances is probably more 
g a cycling procedure in most cases. Machine 


economical than followin: 1 i h 
techniques for handling large matrices of covariances will be presented in a 


later paper. 
REFERENCES 
Jane, and Gleser, Goldine C. The construction of homogeneous 


1. DuBois, P. H., Loevinger, dus 
Res. Bull. 52-18, Air Training Command, Hum. Res. 


keys for a biographical inventory. 


Res. Center, 1952, Lackland Air Force Ваве. | 
2. Ferguson, С. A. On the theory of test discrimination. Psychometrika, 1949, 14, 61-68. 


3. Jackson, В. W. B., and Ferguson, С. А. Studies on the reliability of tests. Bull. no. 12, 


Dept. Educ. Res., University of Toronto, 1941. | ў 
^ 4, Kuder, G. F., and Richardson, M. W. The theory of the estimation of test reliability. 


Psychometrika, 1937, 2, 151-160. 


d Manuscript received 1/6/53 
Revised manuscript received 6/7/53 


PSYCHOMETRIKA—VOL. 18, NO. 3 
DECEMBER, 1953 


SOME APPLICATIONS OF ATTRIBUTE PREDICTION THEORY 
TO PSYCHOPHYSICS 


Norman C. PERRY 
SAN JOSE STATE COLLEGE 


The following development isa coordination of certain aspects of mental 
test methods and psycho hysics. The biserial and triserial prediction equations * 
n theory are utilized to reformulate the determination of 


of the Katzell-Cureto! 
absolute and differential limens from data produced by the constant method. 


Introduction 


Katzell and Cureton (4) describe a method for predicting the probability 
of an individual's falling in one of two categories of а continuous dependent 
variable (y) from & knowledge of his score on an independent test variable 


X, as for example in the prediction of pilot success as a function of stanine 
f this problem (i.e., obtaining X аза 


test score. A solution to the converse 0 
inclusion in one of two categories) 


function of а preassigned probability of 

was developed by Guilford and Michael (2). The theory, as carried out 
in these two papers, has a natural application to the psychophysical method 
of constant stimuli. ° 

An extension to three categories of the entire theory was fully developed 
by Michael and Perry (5), the principal new feature yielded by the generali- 
zation being that for а given probability of inclusion in the middle category 
of three, there correspond two values of X. This is intuitively plausible 
since, in terms of an academic example, a student may be so bright that he 
has only one chance in four of getting a C in а course, or he may have so 
low an I.Q. that he has only one chance in four of getting a C. The essential 
features of three-category methods are easily applied to the psychophysical 
method of constant stimulus difference. 

The entire line of development will be readily seen to relate other minor 
aspects of psychophysics to test theory. 

The mathematical essence of the Katzell-Cureton approach is to set 
up а least-squares equation relating y 60 X, on the assumptions of linearity 
of regression of the dichotomized variable on the independent variable, and 
normality and homoscedasticity of column array. 16 is further assumed 
that y (before dichotomization) is reasonably approximated by a unit normal 
variable. The correlation between y and X is, of course, а biserial one denoted 


by the usual symbol ть. 
319 


320 PSYCHOMETRIKA 


For prediction of y from X, therefore, we have 
y. = > X – M), (1) 


with standard error of estimate given by the equation 
0, = c, Vl— m. (2) 


These equations yield the predicted (mean) value and standard de- 
viation of y for any given value of X. The probability of an individual's 
membership in a category is now determined by finding the proportions of 
area of the y-array which lie above or below the point of dichotomy. The 
point of dichotomy is expressed by the standard score of the point of division 
of the marginal distribution of y, a value which will be denoted hereafter 
by z, . 

Specifically, in order to ascertain the likelihood of membership in either 
category for the X value under consideration, one obtains the difference 
2, — y. , and divides this difference by the standard deviation of the array 
to obtain the new standard score value 


ay = Ee, (3) 
Oyz 


Finally, a table of normal curve areas applied to 2/’ yields the required 
probabilities. 3 
The central concept involved in applying these equations to psycho- 
physics is to let X be the stimulus value, and y be the response value. Several 
immediately resulting minor syntheses of psychophysies and test theory 
Should be pointed out. Such an approach unifies the ogive assumption in 
mental test theory of the probability of item success as а function of ability, 
with the phi-gamma hypothesis in the method of constant stimuli. Both 
of these follow immediately from the linear regression and normality of 
column array assumed in the mathematical theory just outlined. In addition, 
the quantity r, can be interpreted as a measure of the Sensitivity of sensory 
discrimination for a given observer. 
Traditionally, the ability to discriminate in the psychophysical context 
has often been thought of as the reciprocal of the constant of proportionality 
w. For example, an individual who can detect a difference 
as small as 2% in lifted weights is much more sensitive than one who can 
detect only a difference of 10%. The former’s power of differentiation (as 
measured by 1/K) is to the latter's as 50 is to 10. However, as an alternative 
rationale, a sensitive observer may be thought of as one who, on different 
occasions, tends to respond in nearly the same way to equal stimuli. Psy- 
chophysical data collected from the reports of such an observer would yield 
а regression of y on X with small column standard deviations. Thus, in 


о 


NORMAN С. PERRY 321 


general, high sensitivity would be associated with large 7, , and low sensitivity 
with a reduced correlation. 


Application of Two-Category Prediction to the Method 
of Constant Stimuli 


In Guilford (1, 168) we find psychophysical data relating X (millimeters 
separation of an aesthesiometer) to p (the proportion of 100 trials in which а 
two-point experience was felt). In terms of these symbols the data are sum- 
marized below: 

рых 9 8 
p .98 .66 .29 .05 .01 


In Guilford and Michael (2) we find the converse of equation (1) developed as 
х= М, + с.. (4) 
Ty 


For the purpose of illustrating the principle of how an equation developed 
for the purpose of computing cutting-scores on a test can be used in psy- 
chophysies we apply equation (4) to the aesthesiometer data. 

It is readily calculated that M. = 10, and o, = 1.414. From the data 
the total of 194 = 93 + 66 + 29 + 5 + 1 indicates that of 500 applications 
of the-stimuli 38.8% gave rise to the two-point experience. Thus z, , as the 
standard-score value yielding a normal curve tail of .388, is equal to .2845. 
Finally, 7, , 25 calculated from the standard formula (M, — M.)/e. р/у, 
is equal to .905. 

Now, from psychophysical principles the limen value of X is that cor- 
responding to the point of dichotomy z, through equation (4), because 
such an X value causes the probabilities of 1- or 2-point experience to be the 
same. We have in test theory the corresponding “principle of equal likelihood" 
which defines, for example, à critical cutting point in terms of stanine score 
as one which allows a candidate to have a .5 probability of being a successful 

ilot. 
i From these considerations, we see that the desired stimulus limen is 
obtained from substitution into equation (4) as X = 10 + .2845/.905 
(1.414) —10.444. This value differs by about .2 from the limens obtained 
by the traditional process which are all in the neighborhood of 10.6. 


Three-Category Prediction Applied to the Method 
of Constant Stimulus Differences. 
d by Michael and Perry (5) produces 


The three-category theory develope 8 
а] cutting scores which are analogous 


the following pair of equations for critic 
to equation (4): 
Х, = М. +, (= 102). (5) 


Ter 


322 PSYCHOMETRIKA 


Here z,, is the standard score of the point of division on the marginal dis- 
tribution of y separating the upper category from the middle one, and similarly 
v, Separates the lowest category from the upper two. Correspondingly, 
X, is the cutting score assuring an individual a .5 probability of membership 
in the upper category, and a score of X, assures an individual a .5 probability 
of inclusion in the lowest, category. Te, is triserial correlation between y and 
X of the type developed by Jaspen (3). 

In Guilford (1, 187) we find psychophysical data relating X (weight in 
grams) to p, q, and s (the respective proportions of 100 trials in which a given 
weight is judged greater than, equal to, and less than 200 grams). In terms 
of these symbols the data are summarized below: 


X 185 190 195 200 205 210 215 
р .05 .12 .15 .30 -55 -70 .85 
q .04 .18 .25 .42 -35 .18 .09 
s „91 .70 .60 .28 .10 .12 .06 


It has been found in experiments of this type that the proportions p and s 
tend to approximate ogive functions of X, a finding in agreement with the 
previously discussed mathematieal rationale of the ogive form in psycho- 
physics and test theory. It has also been established empirically that the 


, cut off a succession of areas which are a symmetrical and 
roughly normal function of base line distance, 


From the data it is readily computed that M, = 200, z, = .2829, 
and а, = 10. From Jaspen's formula (3) we compute тү, to be -760, a statistic 
which can reasonably be interpreted as a measure of the sensitivity of the 


observer in a manner paralleling the previous development for r, . 
Hence 


“= (.283)(10) _ 
X, = 200 + коздуу — 208.72. 
In psychophysical terminology the upper difference limen DL, = 3.72 on 


the assumption that the “ 


the lower difference 
limen DL, . 


The Method of Average Error and the Difference Limen 


Many workers in psychophysies have found that the probable error of 
observations as found in the method of average error is roughly proportional 


о 


NORMAN С. PERRY 323 


to the DL as determined by other methods. The following development 
gives à mathematical rationale for this empirical fact in terms of category 
prediction theory. 3 

From the concluding example of the previous section it is clear that 
with appropriate assumptions the DL is equal to 2,0./T., . To express the 
S.E. of response in stimulus units one only needs to substitute y. = 1 in 
the regression equation y. = (г/о.)(Х — M.) and solve fore = X — M.. 
Thus we have 1 = (r,,/c;) 2, and S.E. = c./r,.. Hence Р.Е. = .6745c./r,. . 

Let us now consider two different observers responding with different 
sensitivity to the same stimulus situation and denote the various mathe- 
matical symbols describing their response by prime and double-prime nota- 
tion. We have then 


Р.Е.’ РЕ.” 6745 e./ri, _ .6745 с/т 
DL’ zy 02/7 ae 


77 or 7 7 
DL Zy с/т. 


Upon cancellation we have г,’ = z,". 


Thus to account for the proportionality found by experiment one need 
make only the intuitively plausible assumption that relative to the dis- 
tribution of response for each observer the standard score of the point of 
dichotomy is always the same. 


An Approximate Method of Correcting the DL for Non-Homoscedasticity 


As pointed out in Woodworth (6) the empirical justification of Weber’s 
Law does not depend on defining the DL in terms of a .5 probability. Other 
probabilities (if not tco extreme) give equally good results. However, for 
any probability other than .5 equation (5) must be more generally stated, 
because the mean of the column array no longer coincides with the point of 
dichotomy of the marginal distribution of y. 

As previously described and as summarized in equations (1), (2), and 
(3) Katzell and Cureton presented a solution to the problem of determining 


the probability of placement in a category for 
not, however, undertake the converse problem. 
From equation (3), 


a given X value. They did 


gris gion wey 
^ Oyz 


it is possible to derive a value of X which will permit the prediction of 
category membership at any desired level of probability. By substituting 
(љ/о.)(Х — М.) for у. [through use of equation (1), and e,4/1 — ri for 
с. [through use of equation (2)], equation (3) may be rewritten after some 
manipulation as 


и _ 202 п — М.) 


ye vic © 


324 PSYCHOMETRIKA 


By means of a few algebraic steps equation (6) may be transformed to 
give X as a function of z?’ as follows: 


X= м. 42 М1 = (^ 
Ty Th 

In the present setting z!’ is determined by the probability used to define 
the difference limen. It will be noted that for р = .5, 2/ = 0, and equation 
(7) is essentially the same as equation (5). 

In test theory the assumption of homoscedasticity is a plausible one, 
but in psychophysics Weber’s Law implies that column variance is less 
for small stimuli than for large stimuli. To obtain a modified form of equation 
(7), more in line with these considerations, we multiply the column standard 


deviation o,, by the factor X/M, . Thus we have the new equation, 
Oz = VI — rl (Х/М а, . (8) 


We now follow through the formal steps used in developing equation (7) 
employing 4/1 — 75, (X/M.) in буг instead of УЛ — y? , and solve for X, 
obtaining after lengthy algebra the new equations 


Й 


X; = (Mir. + 2,0, MIL, + VI Ти с); (= Lor). (9) 


In the development of (9) it is necessary to assume that the redistribu- 
tion of column variance introduced by equation (8) (i.e., less varience in 
columns below the mean and more column variance above the mean) does 
not influence Seriously the properties of the regression line aid the marginal 
distribution of y used in developing equation (7). Hence equation (9) can 
give approximate results only, and is intended merely to give some estimate 
of the influence of Weber’s Law on cutting points. 

With these understandings let us now compare the results yielded by 
equations (7) and (9) when applied to the weight-lifting data previously 
presented, and using an upper limen based on a probability of .7 of inclusion 
in the “greater” category. The value .7 was the limen probability used by 
Binet for mental age, and we choose this value to accentuate the parallel 
between test and psychophysical theory. 

For р = 17, а" = —.5244, and substituting this value and previously 
computed statistics into equation (9) we obtain 


Pa (200)*(.760) + (.2829)(10)(200) 208.4 
(200)(.760) + 4/1 = (760 (10)(— .5244) ^^ 


A corresponding use of equation (7) yields 


X = 200 + (ав) ET м1 – (760) (1005244) = 208.2. 


— 


| 


NORMAN C. PERRY 325 


Apparently, then, the estimated correction for the influence of Weber's 
Law is roughly represented by the discrepancy between а DL, of 8.4 and 
one of 8.2. The smallness of this difference (.2) is probably caused by the 
relatively small range of weights used (8 grams) compared to the average 
weight (200 grams). 


REFERENCES 


1. Guilford, J. P. Psychometric methods. New York: McGraw-Hill Book Company, 1936. 

2. Guilford, J. P., and Michael, W. B. The prediction of categories from measurements. 
Beverly Hills, California: Sheridan Supply Company, 1949. 

3. Jaspen, N. Serial correlation. Psychometrika, 1946, 11, 23-30. 

4. Katzell, R. A., and Cureton, E. E. Biserial correlation and prediction. J. Psychol., 1947, 
24, 273-278. 

5. Michael, W. B., and Perry, N. C. The prediction of membership in a trichotomous 
dependent variable from scores in а continuous independent variable. Educ. psychol. 
Меаз., 1952, 12, 368-391. 

6. Woodworth, R. S. Experimental psychology. New York: Henry Holt and Company, 
1938. 


Manuscript received 4/6/58 
Revised manuscript received 5/29/53 


~ 


2 
| 


PSYCHOMETRIKA—VOL. 18, ко. 4 
DECEMBER, 1953 


NOTES ON AN APPROXIMATION METHOD FOR FITTING 
PARABOLIC EQUATIONS TO EXPERIMENTAL DATA* 


A. CHAPANIS 
THE JOHNS HOPKINS UNIVERSITY 


When a numerical transformation of raw data is used only to simplify 
the arithmetic of curve fitting, the transformation may lead to undesirable and 
even highly distorted results. This principle is illustrated with an approxima- 
tion method of fitting parabolic equations to experimental data, as described 
recently in texts by Johnson and Lewis. Although the approximation method 
will never yield as good fits as the exact, least-squares method, satisfactory 
results are in general achieved whenever the transformed scores yield a linear 
plot as a function of X. The principal difficulty with the method is that some 
data which fall along a parabola may not vield a linear plot of the transformed 
scores MEUS X, and so cannot be fitted satisfactorily by the approximation 
method. 


Introduction 


The method of least squares is commonly used for fitting empirical 
and theoretical curves to experimental data. Ап important feature of this 
method is that it defines the *tbest-fitting" line for a set of data as that line 
which minimizes the sum of the squared differences between observed and 
calculated values. Many problems of curve fitting also involve the use of 
some sort of numerical transformation of the data, the most common trans- 
formation in psychology, perhaps, being a logarithmic one. There are, how- 
ever, some important consequences of using numerical transformations in 
combination with the method of least squares for curve-fitting problems. 
These are pointed out briefly in an article by Mueller (3), but are generally 
ignored in most statistics textbooks written for psychologists. This note 
illustrates one of these consequences with а practical example. 

Briefly the issue is this: When data are first transformed and then 
treated by the method of least squares, the sum of the squared differences 
between observed and caleulated values is no longer a minimum. The double 


numerical treatment minimizes the sum of the squared differences between 


observed and calculated transformed values. Sometimes—as in many psycho- 
physical problems—this is precisely the result the experimenter wants to 
- i i Systems Division, Naval Ви h 
"This study was done in cooperation with есл Мыз Бесс 
1- "Task Order I, between the Office of Naval Ri h 
Laboratory, under Contract N5-ori 160 s is Report No. 166-1-156, Project Designation 


Hopkins University. Thi pol 
Nd 26 аа И The author is indebted to Dr. Hermann von Schelling, 
cf the Naval Medical Research Laboratory, U. 5. Naval Submarine Base, New London, 


i for technical advice. Miss. Judith T. Parker and Mr. William T. Pollock 
sepe the tedious computations required for this note. 


327 


328 PSYCHOMETRIKA 


achieve. In many visual problems, for example, variability is constant with 
respect not to arithmetic values of the stimuli, but rather to their logarithmie 
transforms. In such instances, it is meaningful to use the transformed scores 
directly in curve fitting. In other instances, however, the numerical trans- 
formation may have no significance other than to simplify the arithmetic 
involved in curve fitting. Under these circumstances, the use of a numerical 
transformation may lead to undesirable or even to highly distorted results. 
It is this point which will be illustrated in this note. 


The Least-Squares Solution for a Parabolic Equation 


Let us consider the case of an investigator who wants to fit a parabolic 
equation to a set of experimental points. Given a set of N points defined 
by the rectangular coordinates, X, Y, the desired equation is of the form 


Y^ ах ox + с, (1) 


where Y' is the value of Y predicted from equation (1), X is the abscissa 
value of the several empirical points, and a, b, and c are constants to be 
determined. 

The least-squares solution for the constants, a, b, and c, is straightfor- 
ward but tedious because it involves three simultaneous equations 


LEY -a Y XY' Lb Y x46 БУЛ? 
ХҮ aN ONP ie nn 
ЖЕУ а уху Хем 


and the computation of seven sums: »'X^Y, Sexy: DUM, OX УК, 
УХ», and УХ, some of Which make use of higher powers of X. (See any 
of a number of textbooks, e.g., Peters and Van Voorhis [4, 429-431], for 
arithmetic details.) An important feature of this solution, however, is that 
it minimizes the sum of the squared differences between the observed and 
predieted Y-values, i.e., buo = Y^) is a minimum. This statement may 


be paraphrased for emphasis as follows: Any other solution for the three 


constants, a, b, and б, must of necessity yield a poorer fit than the least- 

Squares solution outlined above. For thi 

squares solution as the e 
Recently Lewis, in 

chology, has described 


· « Я 


ell. For most practica- 
important advantages 


A. CHAPANIS 329 


to commend it: (1) it is much simpler to use than the exact technique dis- 
cussed above, and (2) it provides a “reduction test" for the empirical data. 
Unfortunately, neither author points out that the method is only an approzi- 
mation, and neither discusses the exact least-squares solution described 
above. This note shows that under certain special circumstances, the “те- 
duction test" aspect of the approximation will fail with the result that the 
technique will yield equations which fit parabolic data very poorly or not at all. 


A Description of the Approximation Method 
First, let us look at the essential features of Lewis’ solution. We want 
to fit an equation of the type 
Y" = dX* +еХ +f. (8) 
This is the same equation as (1) above, but we have replaced Y' by Y", and 
the constants, а, b, and c, by d, e, and f, in order to keep Lewis’ method 


distinct from the exact least-squares method. Now choose any point, X, , Y, , 
from the experimental data and form the equation 


Y, = dXi + eX, + f. (4) 
Subtract this expression from the immediately preceding one, thus 
Y" — Y, = 4(Х* — X) + «X — X). (5) 
This ean be rewritten 
xa qn © 
or 
DE = (e + аХ)) + dX. (7) 


In this last expression, e and dX, are constants and so may be replaced by 
the single constant g. In addition, let us define 


Y"— Y 
^L —____ + 
dt X-X. à 
'Thus equation (7) reduces to 
2' = g + dX, (9) 


the equation for a straight line. 
How does one fit parabolas with this technique? First, we start with a set 
of empirical points, X, Y. Choose any convenient point, X, , Y, , and form, for 
| each set of experimental coordinates, the value Z = (Y — Y,)/(X — X. E): 
Now we have pairs of coordinates, X, Z. Use the least-squares solution for 
| ^ а straight line to find the constants g and d. Solve for e by the equation 


| е=р—@Х\, (10) 


330 PSYCHOMETRIKA 


and for f by the equation 
f-2-Y—dXi-eX, (11) 


Equation (11) comes from the identity in equation (4). 

Note, incidentally, that if the X. 1, Y, selected is an experimental point, 
the least-squares solution for a straight line uses У — 1 instead of N points, 
because when X — X, ; 2 is indeterminate. As suggested later by Lewis 
(2, 80), however, X, , У: may be a point selected from a curve drawn free- 
hand through the data. In this case, of course, there are М points available 
for the least-squares solution. 

The simplicity of the approximation method comes principally from the 
fact that it involves only the two Simultaneous equations required for a 


straight line, 
2х2 =а DX +g zx 
A= dS X 


and the computation of only four sums, OM. A2. УХ, and УХ, none 
of which makes use of а power higher than 2. 

It is readily apparent, however, that the approximation can never be 

28 good as the exact solution, because У (У — Y”)? can never be as small as 

(Y — Y'y. The reason, of course, is that the approximation minimizes 

not У (У — Y")*, but rather xz = Z'y*. From equation (8) it is apparent 

that Z is a complex numerical transformation involving a ratio, two variables, 


and two constants. Some important features of the approximation technique 
are the following: 


|| 


(12) 


(1) In general, there will always be a discrepancy between the approxi- 
mate and the exact parabola. However, if the N empirical points lie exactly 
on a parabola, the approximation yields the true solution. It follows that 
the approximation is better, the closer the given points fall around a true 
parabola. F 

(2) The approximate parabola always goes through the point X. 153 2 
That this is so is apparent from equations (3) and (4). When X is equal to X, , 
Y" is equal to Y, . "Thus, the closer the point, X, , Y, , is to the best parabola, 
the better is the approximation. Conversely, the more distant the point, 
X; , Y, , from the best parabola, the poorer is the approximation. 

(3) The approximation overweights points close to X 1, У, and under- 

it. weights are proportional to 1/ |X – Х, |. 
be selected from the center of the X-range. 
(4) Although the approximation method does not minimize О, 
it satisfies the equation У (У — Y") = 0. 

(5) The approximation solution is invariant under cer 
formations. Equation (8) shows that, the addition (or subtraction) of the 
same constant to every X- and/or ¥-coordinate does not, change Z, and so 
will not change the value of УУ — Y"). The addition of а constant to the 


tain simple trans- 


= 


| 


А. CHAPANIS 331 


X- (or Y-) coordinates merely shifts the origin to the right or left (or up or 
down) without affecting the relationship of the plotted points to one another. 

Although it is perhaps not immediately apparent, it is true nonetheless 
that multiplying every X-coordinate by the same constant leaves the solution 
essentially unchanged, i.e., УЕ — Y"y is unaffected, even though the 
Z-values and the constants d and e do change. What happens in this instance 
may perhaps be made clear if it is recalled that multiplying each X-coordinate 
by а constant is equivalent to stretching (or compressing) the scale along 
the abscissa without affecting the vertical distances along the ordinate. 
It is easy to visualize that such a stretching will not alter the vertical distance 
of an observed point from a predicted point, and hence the squared residual 
will remain unaltered. 

Finally, we can visualize the effect of multiplying the Y-coordinates 
by a constant, К. This will stretch or compress the Y-scale. Each Y-residual 
will be increased K times, and each squared residual will be increased К? 
times. Thus, multiplying every Y-coordinate by the constant, K, yields 
a final У (У — У”)? value which is К? times the original one. 


A Practical Illustration 


As a practical illustration of the principles discussed above, we shall use 
the set of raw data appearing in columns (1) and (2) of Table 1. A plot of 


з TABLE 1 


Parabolic Equations Fit to a Set of Data by the 
Approximation Method* 


Best-Fitting Equations by 


x Y Lewis' Approximation Method Z(Y — У) 
11.2 24 Y" = 1,0%5X? — 17.79Х 494.60 11,960 
A ТЕ Y" =  .6807X* — 14.36Х + 90.51 10.590 
о an {Ү” = — .8790X? + 9.234X + 28.56 271.9 
06 a ү” = — 6276X? + 4.646X + 43.02 734.9 
so s ү” = — 9332X: + 7.696X + 39.64 804.8 
d E y" = —1.438X? + 14.62X + 93.49 693.6 
ise ш iY" = — 9355X? + 9.257X + 30.98 225.7 
EE 8 Y" = — .6220X* + 4.116X + 45.78 1,020: 
ы E Y" = —1.148X + 10.08Х + 35.92 917.8 
3.8 53 Y" = — .9673X* + 9.595X + 30.51 226.9 
НЕ Е ү” = —1.238Х: + 12.02Х + 29 08 510.7 
2.0 43 ТУ" = — .8740X? + 9.318X + 27.86 303.1 
1.5 42 ТУ” = — .9034X? + 9.106X + 30.37 224.8 
.5 39 Y" = —1.640X? + 14.71Х + 32.05 2,588. 
л 25 Y^" = —1.981X? + 19.03Х + 23.12 3,185. 


*Columns (1) and (2) contain the raw data; equations of the best-fitting parabolas when each pair of 
coordinates is taken as X1, Y; in Lewis’ approximation method appear in column (3); and X (Y — Y^: for 
each equation is given in column (4). Daggers (1) identify the approximate solutions which do not differ markedly 
from the exact, least-squares solution. 


332 PSYCHOMETRIKA 


these points, and of the best-fitting parabola computed by means of the 
three simultaneous equations (2), are shown in Figure 1. The equation of 
the parabola is 

* 


Y' = —.9570X* + 9.670X + 29.62. 


УУ — У) computed from this equation is 219.2. 

Column (8) of Table 1 gives the 15 parabolic equations obtained when 
each pair of empirical coordinates is taken successively as the X; , Y, in 
the approximation method, and Figures 2, 3, and 4 each show curves for 
three of the equations computed by the approximation. In each figure the 
points are those shown in Figure 1 and listed in columns (1) and (2) of Table 1. 
The solid curve is the best-fitting parabola shown in Figure 1. The inter- 
rupted lines are three solutions achieved by the approximation method. 
Arrows identify the points used as X, , У, in the approximation solutions. 

Finally, column (4) of Table 1 gives the У (У — У”)? values for each 
of the 15 parabolie equations computed by the approximation. The constants 
of the equations in Table 1 have been rounded off to four significant digits 
from intermediate calculations which were carried out to more significant 
figures. This is more accuracy than is usually warranted by this kind of 
problem, and rounding off intermediate caleulations will change the values 
somewhat. Actually, all computations were performed independently by two 
persons, one of whom carried out intermediate calculations to many signifi- 
cant figures, the other of whom rounded off intermediate calculations to 
no more than four significant digits. The final discrepancies in the 
УУ — Y"y values between the two computers generally amounted to less 
than 1 per cent and never exceeded 3 per cent. 

Table 1 confirms what we should have expected on theoretical grounds— 
in every instance the approximations yield У (У — У”)? values which are 
greater than the У (У — У resulting from the exact formula. Further, 
only 5 of the 15 approximate equations (those marked by daggers) are 
even reasonably good fits to the data. Note that for all five of these equations 
the X, , Y, values are close to the best-fitting parabola (these are the points 
designated by arrows in Figure 1), thus agreeing with the second statement 
above. The remaining 10 of the 15 approximate equations yield EY- реј 
values which are two to fifty times as large as the У (У — Y")? value obtained 
from the exact equation. In a few instances (see Figure 3) the approximation 
method yields equations which are obviously bad fits to the data; and, in 
this example, two fits (Figure 4) are manifestly absurd. Note that the very 
bad fits occur when X, , У, is selected from either end of the X-range, thus 
agreeing with the third statement of the preceding section. Finally, we 
can see from Figures 2, 8, and 4 that the equations achieved by the approxi- 
mation method always pass through X, , Y, . 


A. CHAPANIS 333 


70 


60 


50 


20) RAW DATA POINTS 
LEAST- SQUARES SOLUTION 
LEWIS’ METHOD: X,*2; Y,*43 
10 LEWIS’ METHOD: X,*3; Yj*54 
эз LEWIS’ METHOD: X,:78; Y,* 50 


FIGURE 1 
The Best-Fitting Parabola Computed by the Exact o 
Least-Squares Technique [Equations (2)] 


FIGURE 2 
Three Solutions by the Approximation Method for 
the Data in Figure 1 


70 100 


RAW DATA POINTS 
LEAST-SQUARES SOLUTION 
LEWIS' METHOD: X,*10; У, = 33 
LEWIS’ METHOD: X,* И; Yi 16 
өзөн LEWIS’ METHOD: X,* 11.2; Y, 24 


60| 90 


50 80 


40 Ao 


60) 


40 


@ RAW DATA POINTS 
ОЈ-—— LEAST- SQUARES SOLUTION 
—— LEWIS' METHOD : ХО! Yj*25 
— LEWIS’ METHOD: Xj*0.5; Yj*39 
-i0| sess LEWIS METHOD: Xj«1,5; 142 20| 


30 


10 
-20 а. 5 7" © 8 10 о 2 а 6 8 10 12 
Xx x 
Ficure 3 FIGURE 4 


2 the Approximation Method Three Solutions by the Approximation Method for 
Three SHE ү Data D m 1 the Data in Figure 1 


334 PSYCHOMETRIKA 


А Limitation of the Approximation М ethod 


Having worked through an illustrative example, we are now in a better 
Position to analyze the true nature of the limitation which applies to the 
&pproximation method. [Incidentally, the fact that our illustrative example 
included the vertex of a parabola (see Figure 1), whereas Lewis’ example deals 
with only one leg of a parabola (2, 78), is not the source of the difficulty.] 
In describing his transformation Lewis (2, 77), states, "-.- equation (3) 
may be used to represent a set of experimental data if a plot of 


- If Z in our terminology 
[see equation (8)], yields a linear plot as a function of X, the approximation 


X, Z. Asa result, the experimenter may conclude that a parabolic equation 
is not the one to use for his data. If, however, he decides to go ahead and 
fit à parabola anyway, he will find that the equation resulting from the use 
of the approximation will fit the data very poorly. This is exactly what 
happened to produce the 10 poor fits in the example.used in the preceding 
section. 

Under what circumstances will parabolic data not yield a linear plot 
of X, Z? In general, this will occur when X 1, Y; is located near some other 
point which deviates moderately in Y-value from Y, . The reason for this is 
contained in the third statement made earlier about the approximation 
method. Points close to Х 1, У, are weighted disproportionately when they 
are transformed to Z-values. 

To illustrate the nature and importance of the overweighting, consider 
the curves in Figure 5. The solid curve is of the equation 


Y = —X* + 10X + 2. 


Let us suppose that we had a set of data which could be deseribed by this 
Curve but that there was a small amount of variability in the data as indicated 
by the dashed boundary lines, Now let us take the point, X = 0, Y = 2, as 


in Z for X-values close to X, . This extreme distortion of Z occurs for values 
of X close to X, irrespective of Where X, lies within the X-range. If Aids 
X-range, however, there is a. balancing of 
п both ends of the X-range. 


А. CHAPANIS 335 


То illustrate more fully the importance of this distortion in curve fitting, 
take the five points whose coordinates are: 0, 2; .08, 1; 1, 11; 3, 23; and 5, 27. 
These points are shown in Figure 7. If we select the point 0, 2 as our X, , У,, 


50 


40 


eo 
Y »be. fap! ес 
N о if 
! 
1 
-oF 1 
| 
1 
-20| | 
I 
Д 
-30 
о 1 2 3 4 Lf 
FIGURE 5 Pisum ч 
Hypothetical Data 
Some Чуро Z-Transformations of the Boundary Lines in 
Figure 5 for Х = 0, У = 2 


: FIGURE 8 
Z-Transformations of the Points in Figure 7 
FIGURE 7 Гог Х = 0, У, = 2 ^ 


Hypothetical Data Fit by the Exact Least- 
Squares Method (Solid Line), and by the 
Approximation Method (Dashed Line) 


and plot Z as a function of X, we have the result shown in Figure 8. Thus, 
although Figure 7 shows that the five points lie on an almost perfect parabola, 
Figure 8 shows that Z is by no means a linear function of X. 


336 PSYCHOMETRIKA 


Fitting a straight line to the points in Figure 8 gives the solution 
Z' = 5.60Х — 15.7. 
The parabola corresponding to this equation is 


Y" = 5.60Х° — 15.7X + 2.00, 


and is plotted as the dashed line in Figure 7. The result is highly unsatis- 
factory Solely because of the extreme weighting of the single point 0З, 1, 
when it is transformed into Z (Figure 8). The solid line in Figure 8 best 
fits the four points there, but the best-fitting parabola would require that the 
line describing 2 as a function of X be more nearly like the dashed one in 
Figure 8. 

This, of course, is an extreme example, but, as we have seen from the 


previous illustration, it is a circumstance which could arise with practical 
experimental data. 


REFERENCES 


1. Johnson, L. H. Nomography and empirical equations. New York: John Wiley and Sons, 


1952. 


2. Lewis, D. Quantitative methods in psychology. Iowa City, Iowa: The Bookshop, 1948. 
3. Mueller, C. G. Numerical transformations in the analysis of experimental data. Psychol. 
Bull., 1949, 46, 198-223. 


. Peters, C. C., and Van Voorhis, W. R. Statistical procedures and their mathematical 
bases. New York: McGraw-Hill, 1940. 


s 


THE PSYCHOMETRIC CORPORATION 337 


PSYCHOMETRIC SOCIETY 
and Disbursements for Fiscal Year Ended June 30, 1953 


Statement of Receipts 
RECEIPTS 
Dues: 
Year Members Student Members 
1954 ....: d 
1958 sese 364 48 
19527 ov: sce s 21 4 
1051 ще 1 
1950 ..--- 1 
388 52 
$2,874.00* 
Miscellaneous: 
Partial payments on 1953 dues for Student and Full 
Membara? э» s ux e ш S рен wh ese ш $ 49.00 
Overpayment on dues applied to 1954 dues ...... 1.00 
Psychometric Corporation .. «rtt t n n n 2.40 
52.40 
Fola Reip +. pee кз э BEDE HSS војни Е $ 2,926.40 
DISBURSEMENTS 
Psychometric Corporation (90% of dues) ниче но oe cm а $ 2,631.60 
Miscellaneous Disbursements: 
Mimeographing and Printing . . +++ etc $ 81.55 
Stationery and Postige.-- +--+ seer ttt 80.74 
Secretarial Services) . + + ss нон 59.54 
Addressing and Mailing Charges. - +--+ +++ 18.79 
Telephone СаЙв . . нь * 10.51 
ее E E .25 
251.38 
ИН: eem mox ce cO а Bum SS юш $ 2,882.98 
BALANCE 
Bank Babies july d, 1952: . . e gm e Muse и oe e nom s s $ 1,047.02 
И S X beans а ка UE с эз ene к ет» 2,926.40 
= $ 3,973.42 
Expenditures, 1059-108. со dir Ease опа Bos ја is os 2,882.98 
Bank Balance, June 30,1953... 44 8 ttt tt t n e 105000 


*Member's dues fo 
Member's dues fo! 


**Partial payment of 1953 du 


г 1953 and 1954 are $7.00. Student's dues are $4.00. 
r all previous years are $5.00. Student's dues are $3.00. 
es occurred as a result of the change from $5.00 to $7.00. 


338 PSYCHOMETRIKA 


PSYCHOMETRIC CORPORATION 
Statement of Receipts and Disbursements for Fiscal Year Ended June 30, 1953 
Bank Balance, July 1, 1952 


car see ә ИЦ $ 5,438.76 
RECEIPTS „A 
Subscriptions (less agency ШЧ НН s e sr uan o $3,146.90 
Psychometric Society (90% of due D з аш а 2,631.60 
esc LS Жз ннн ЫБА АА КАР 1,218.75 
Sale of Psychometric Monographs 5-8 (less agency discounts) 287.90 ! 
Miscellaneous ара uua chr psp 11.87 ч 
Cru FR; $ 7,297.02 
DISBURSEMENTS 
Printing Psychometrika 
Volume 17, Numbers 2-4 | 
S UI а $6,242.72 
Stipend of Assistant Editor 1 
Volume 16, Number 4 
а e... uL I T 562.50 a 
Secretarial Services, stationery, postage S ee o6 шш, 801.88 
Publication of Psychometric М, onographs 6-8 
(less contribution by Educational Testing Service of | 
$915.76 toward printing of Monograph Number 7) . , 2,467.17 | 
Shipping back issues to new printer and establishing new 
Addressograph р dle oe... ee e 603.82 
Miscellaneous «cc Жик MEER MED 173.76 
Таи oi а И р Pensa $10,851.85 
Bank Balance, June 30, 1953 


ME LIES ВИ MUS $ 1,883.93 | 


Estimated Cash Receipts and Disbursements, July 1, 1953-Dec. 31, 1953 


(Excluding receipts for 1954 subscriptions) 
Bank Balance, July 1, 1953 


ЗА a ANNUI. $ 1,883.93 
Estimated Receipts 
Sale of back issues and ОРЕ o. o Sees es, $ 500.00 
Estimated Disbursements 

Printing Paychometrika, Volume 18, Numbers 2-4. , | |. $4,000.00 & 

Stipend to Assistant Editor, Volume 18, Numbers 1-4 , |. 360.00 Н 

Secretaria] Services, Stationery, SEI 3 ада си а 320.00 

Miscellaneous i rc ere aes. 7 20.00 

== 
Total Estimated о SGT Us а а $ 4,700.00 


. Anticipated Cash Deficit ав of Dec. 31 
non-interest-bearing loans from Psych 


Каш а ЊЕ © sya. cok peal |, WS $ 2,316.07 ` 


Я 


INDEX FOR VOLUME 18 


AUTHOR 
Abelson, Robert P., “А Note on the Neyman-Johnson Technique." 213-218. 
Angoff, William H., “Test Reliability and Effective Test Length." 1-14. 
Bordin, Edward S., “Norman FREDERIKSEN and W. B. Scuraver, Adjustment to College.” 
A Review. 187-188. к 
Carroll, John В., aon Analytical Solution for Approximating Simple Structure in Factor 


alysis.” 
cupit ден А ER бап Approximation Method for Fitting Parabolic Equations 
Ginen Poul Шу of Reasoning Ашы dE гем 1. Соту) 
араат Gear of МАР ME e 5. n 
е MER HE S 


DuBois, Philip H. (With Goldine C. Gleser and Jane Loevi , “Maximizi 
Discriminating Power of a Multiple-Score Test." 309-317. Qs Уа puo 


Fleishman, Edwin A., “А Factor Analysis of Intra-Task Performance on T 3 
motor Tests.” 45-55. n Two Psycho. 


French, John W., “Рнилр VERNON, The Structure of Human Abilities." A Review. 181-182. 


Friedman, Gabriel (With Virginia Zachert), “The Stability of the Factorial Р 
Aircrew Classification Tests in Four Analyses." 219-221. ial Pattern of 


Fruchter, Benjamin, “Differences in Factor Content of Rights and Wrongs Scores." 


257-265. о 
Gibson, W. A., “A Least-Squares Solution for Case IV of the Law of Comparative Judg- 
ment." 15-21. 


Gibson, W. A., “A Simple Procedure for Rearranging Matrices." 111-113. 


leser, Goldine C. (With Jane Loevinger and Philip Н. DuBois), “Maximizi 
d Discriminating Power of a Multiple-Score Test." 309-317. aximizing the 


‘Goheen, Howard W. (With Melvin D. Davidoff), “A Table for the Rapid Determinatio: 
of the Tetrachoric Correlation Coefficient.” 115-121. n 
‘Goon Leo A., “A Further Note on ‘Finite Markov Processes in Psychology,’ " 245. 


Green, Russel F. (With J. P. Guilford, Раш В. Christensen, and Andrew L. 
“А Factor-Analytie Study of Reasoning Abilities.” 135-160. Сошгеу), 


Guilford, J. P. (With Russel F. Paul R. Christensen, and And 
^A Pactor-Analytic Study of Iousening Abilities.” 135-160. pue 


Gulliksen, Harold, Comments on Guttman’s Review of Theory of Mental Tests." 131-133, 
Gulliksen, Harold, “А Generalization of Thurstone's Learning Function.” 297-307 
Guttman, Louis, “Image Theory for the Structure of Quantitative Variates.” 277-296 


ЕО, Lou, “Reliability Formulas that Do Not Assume Experimental Independence.” 


pic, Louis, “А Special Review of HAROLD GULLIKSEN, Theory of Mental Tests.” 123- 
Jones, Lyle V., Dono C. ADKINS and SAMUEL В. ; 
Tests.” A Review. 182-184. P Танау Factor Analysis of Reasoning 
339 


340 - i эй. INDEX. 
Kao, Richard C. W., “Note on Miller's ‘Finite Markov Processesin Psychology. " 241-243. 
Katz, Leo, “А New Status Index Derived from Sociometric Analysis." 39-43. 


Katz Leo (With James H. Powell), “А Proposed Index of the Conformity of One Socio- 
metric Measurement to Another." 249-256. | ©, 


Klein, L. R., “Gerard Тихтхвв, Econometrics.” A Review. 95-96. 


Loevinger, Jane (With Goldine C. Gleser and Philip H. DuBois), “Maximizing the Dis». 
criminating Power of a Multiple-Score Test." 309-317. 


Lord, Frederic M., “An Application of Confidence Intervals and of Maximum Likelihood 
to the Estimation of an Examinee’s Ability.” 57-76. 4 


Maritz, J. S., “Estimation of the Correlation Coefficient in the Case of a Bivariate Normal 
Population When One of the Variables is Dichotomized.” 97-110. 


Miller, С. A., “Luoyp A. Jerrress, Cerebral Mechanisms in Behavior, The Hizon Sympos- 


tum.” А Review. 184-187. - . Vil 
Perry, Norman C., “Some Applications of Attribute Prediction Theory to Psychophysics. 
319-326. . 


Powell, James Н. (With Leo Katz), “А Proposed Index ог ће Conformity of One Socio- 
metric Measurement to Another." 249-256. 


Psychometric Corporation; Report of the Treasurer. 337. 
Psychometric Society, Report of the Treasurer. 338. 


Rippe, Dayle D., “Application of a Large Sampling Criterion to Some Sampling Problems 
in Factor Analysis.” 191-205. 


Thorndike, Robert L., “Who Belongs in the Family?" 267-276. 


е . В "n 
Webster, Harold, “Approximating Maximum Test Validity by а Non-Parametric Method. 
‚ 207-211. 


Wherry, Robert J. (With Ben J. Winer). “A Method for Factoring Large Numbers of 
-' ems." 161-179. 


Winer, Ben J. (With Robert J. Wherry), “A Method for Factoring Large Numbers of 
Items." 161-179. 


Zachert, Virginia (With Gabriel Friedman), “The Stability of the Factorial Pattern of 
Aircrew Classification Tests in Four Analyses." 219-224. 


] 
Zimmerman, Wayne S., “A Revised Orthogonal- Rotational Solution for Thurstone$ 
Primary Mental Abilities Test Battery." 77-93. 


№ 


ә 


